Exercises and associated data

The data and modelling objects created in this notebook can be downloaded directly to save computational time.


Users who wish to complete the exercises can download a small template R script. Assuming you have already downloaded the data objects above, this script will load all data objects so that the steps used to create them are not necessary to tackle the exercises.

Load libraries and time series data

This tutorial relates to content covered in Lecture 5, and relies on the following packages for manipulating data, shaping time series, fitting dynamic regression models and plotting:

library(dplyr)
library(mvgam) 
library(tidybayes)
library(bayesplot)
library(gratia)
library(ggplot2)
library(marginaleffects)

Life cycle of the blacklegged tick Ixodes scapularis; photo Centers for Disease Control and Prevention


The data we will work with in this tutorial come from the National Ecological Observatory Network (NEON) tick drag cloth samples. Ixodes scapularis is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised long-term monitoring of tick abundances as well as other important indicators of ecological change.


Map of field sampling sites used by the NEON program; image: Weiglein et al 2021


Nymphal abundance of I. scapularis is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern.


The drag cloth method used by NEON field teams to survey for ticks; photo National Ecological Observatory Network


The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs.

Manipulating data for modelling

We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in Clark & Wells 2022. You can read a bit about the data using the call ?mvgam::all_neon_tick_data

data("all_neon_tick_data")

Below is a convenience function to prepare the NEON Ixodes scapularis nymph survey data for modelling by filtering to the correct sites and getting data into the right long-format structure that we are used to using for mvgam models

prep_neon_data = function(split_prop = 0.8){
    plotIDs <- c('SCBI_013','SCBI_002',
                 'SERC_001','SERC_005',
                 'SERC_006','SERC_012',
                 'BLAN_012', 'BLAN_005')
    
    # Filter to the correct plot IDs
    model_dat <- all_neon_tick_data %>%
      dplyr::mutate(target = ixodes_scapularis) %>%
      dplyr::filter(plotID %in% plotIDs) %>%
      dplyr::select(Year, epiWeek, plotID, target) %>%
      dplyr::mutate(epiWeek = as.numeric(epiWeek)) %>%
      dplyr::filter(Year > 2014 & Year < 2021) %>%
      dplyr::mutate(Year_orig = Year)
    
    # Fill in missing observations with NAs
    model_dat %>%
      dplyr::full_join(expand.grid(plotID = unique(model_dat$plotID),
                                   Year_orig = unique(model_dat$Year_orig),
                                   epiWeek = seq(1, 52))) %>%
      dplyr::left_join(all_neon_tick_data %>%
                         dplyr::select(siteID, plotID) %>%
                         dplyr::distinct()) %>%
      dplyr::mutate(series = plotID,
                    season = epiWeek,
                    year = as.vector(scale(Year_orig)),
                    y = target) %>%
      dplyr::select(-Year, -epiWeek, -target) %>%
      dplyr::ungroup() %>%
      dplyr::arrange(Year_orig, season, series) -> model_dat
    
    model_dat %>%
      dplyr::left_join(all_neon_tick_data %>%
                         dplyr::ungroup() %>%
                         dplyr::select(Year, siteID, cum_sdd, cum_gdd) %>%
                         dplyr::mutate(Year_orig = Year) %>%
                         dplyr::select(-Year) %>%
                         dplyr::distinct()) -> model_dat
    
    # Convert ids to factors
    model_dat = model_dat %>%
      dplyr::filter(Year_orig <= 2019) %>%
      dplyr::mutate(plotID = factor(plotID),
                    siteID = factor(siteID),
                    series = factor(series))
  
  # Scale the environmental covariate and store the mean and sd for later plotting
  sd_gdd <- sd(model_dat$cum_gdd)
  mean_gdd <- mean(model_dat$cum_gdd)
  model_dat$cum_gdd <- (model_dat$cum_gdd - mean_gdd) / sd_gdd
  
  # Add time column
  model_dat %>%
    dplyr::ungroup() %>%
    dplyr::group_by(series) %>%
    dplyr::arrange(Year_orig, season) %>%
    dplyr::mutate(time = seq(1, dplyr::n())) %>%
    dplyr::ungroup() -> model_dat
  
  # Split into training and testing
  data_train = model_dat[1:(floor(nrow(model_dat) * split_prop)),]
  data_test = model_dat[((floor(nrow(model_dat) * split_prop)) + 1):nrow(model_dat),]
  
  # Return needed objects in a list
  list(data_train = data_train, data_test = data_test,
       sd_gdd = sd_gdd, mean_gdd = mean_gdd)
}

Prep the data for hierarchical modelling in mvgam using the above function. Here we use a split of 80% training and 20% testing, leaving approximately one year of sampling for forecast evaluation. In the resulting data, the series variable represents the lowest level of aggregation we have in the data (i.e. the plotIDs)

tick_data <- prep_neon_data(split_prop = 0.8)

Preview the data to understand how it is structured

head(tick_data$data_train)
## # A tibble: 6 × 10
##   plotID   Year_orig siteID series   season  year     y cum_sdd cum_gdd  time
##   <fct>        <dbl> <fct>  <fct>     <dbl> <dbl> <dbl>   <dbl>   <dbl> <int>
## 1 BLAN_005      2015 BLAN   BLAN_005      1 -1.46    NA    173.  -1.73      1
## 2 BLAN_012      2015 BLAN   BLAN_012      1 -1.46    NA    173.  -1.73      1
## 3 SCBI_002      2015 SCBI   SCBI_002      1 -1.46    NA    227.  -0.683     1
## 4 SCBI_013      2015 SCBI   SCBI_013      1 -1.46    NA    227.  -0.683     1
## 5 SERC_001      2015 SERC   SERC_001      1 -1.46    NA    334.  -0.810     1
## 6 SERC_005      2015 SERC   SERC_005      1 -1.46    NA    334.  -0.810     1
dplyr::glimpse(tick_data$data_train)
## Rows: 1,664
## Columns: 10
## $ plotID    <fct> BLAN_005, BLAN_012, SCBI_002, SCBI_013, SERC_001, SERC_005, …
## $ Year_orig <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, …
## $ siteID    <fct> BLAN, BLAN, SCBI, SCBI, SERC, SERC, SERC, SERC, BLAN, BLAN, …
## $ series    <fct> BLAN_005, BLAN_012, SCBI_002, SCBI_013, SERC_001, SERC_005, …
## $ season    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, …
## $ year      <dbl> -1.461502, -1.461502, -1.461502, -1.461502, -1.461502, -1.46…
## $ y         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ cum_sdd   <dbl> 173.30, 173.30, 226.95, 226.95, 334.05, 334.05, 334.05, 334.…
## $ cum_gdd   <dbl> -1.7340922, -1.7340922, -0.6833791, -0.6833791, -0.8097307, …
## $ time      <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, …

Inspecting the time series

There are an enormous number of NAs in these data, as the NEON field team only samples under certain conditions (NAs are shown as red bars in the below plot). As we have seen in previous lectures / tutorials, the vast majority of existing time series algorithms and models would fail immediately if we tried to apply them to these data!

# look only at a few columns as only the y column has NAs
tick_data$data_train %>%
  dplyr::select(plotID, time, y) -> preview_data
image(is.na(t(preview_data %>%
                dplyr::arrange(dplyr::desc(time)))), axes = F,
      col = c('grey80', 'darkred'))
axis(3, at = seq(0,1, len = NCOL(preview_data)), 
     labels = colnames(preview_data))

Because of the high proportion of NAs, attempts to plot a timeseries line plot will not be as useful (though the other plots in this call will still be helpful to see the distribution of the data):

plot_mvgam_series(data = tick_data$data_train, series = 1)

To inspect the actual timeseries, we will have to resort to plotting the observations as points rather than lines. A quick call to ggplot2 will be better for our purposes:

theme_set(theme_bw(base_size = 12, base_family = 'serif'))
ggplot(rbind(tick_data$data_train,
             tick_data$data_test) %>%
         dplyr::mutate(Site = siteID), aes(x = time, y = y, fill = Site)) +
  geom_point(shape = 21, colour = 'white', size = 2.5) + 
  facet_wrap(~plotID, ncol = 2, scales = 'free_y') +
  theme(axis.ticks = element_line(colour = "black", size = 1)) +
  labs(x = 'Time', y = 'Abundance')

As you can see, it is very hard to identify any patterns when plotting this way. A key question we’d like to investigate is whether the seasonal patterns in tick abundances vary across sites and plots. We can use some simple visuals to at least inspect variation in the relative counts per season, per site, which may reveal some useful insights:

rbind(tick_data$data_train,
      tick_data$data_test) %>%
  
  # calculate a total count per plot, per year
  dplyr::group_by(plotID, year) %>%
  dplyr::summarise(total_count = sum(y, na.rm = TRUE)) %>%
  
  # join back tot he original data
  dplyr::right_join(rbind(tick_data$data_train,
                          tick_data$data_test)) %>%
  
  # calculate the relative count as a proportion
  # by adjusting for the total yearly count
  dplyr::rowwise() %>%
  dplyr::mutate(rel_count = y / total_count) %>%
  
  # for each plotID, calculate the median proportion per
  # season to inspect overall seasonality
  dplyr::group_by(plotID, season) %>%
  dplyr::summarise(rel_count = median(rel_count, 
                                      na.rm = TRUE),
                   Site = unique(siteID)) -> seasonality

# visualize the seasonal patterns for each plotID
ggplot(seasonality, aes(x = season, y = rel_count, fill = Site)) +
  geom_smooth(size = 1.5, se = FALSE, aes(colour = Site)) +
  geom_point(shape = 21, colour = 'white', size = 2.5) + 
  ylim(c(0, 0.7)) +
  facet_wrap(~plotID, ncol = 2) +
  theme(axis.ticks = element_line(colour = "black", size = 1)) +
  labs(x = 'EpiWeek', y = 'Relative count')

This plot demonstrates that there are certainly similarities in the observed seasonal patterns, with tick abundances increasing in the warmer months of the year. We obviously cannot put too much faith in the smooth curves as they are only applied to the median relative counts per site. It would be far better if we could try and learn the overall seasonal patterns by leveraging all the data we have (regardless of how messy and sparse it is). Another consideration we need to make is that these data are strongly overdispersed, with a skew towards zeros and very small counts.

ggplot(rbind(tick_data$data_train,
             tick_data$data_test) %>%
         dplyr::mutate(Site = siteID), aes(x = y, fill = Site)) +
  geom_histogram() + 
  facet_wrap(~plotID, ncol = 2, scales = 'free_y') +
  theme(axis.ticks = element_line(colour = "black", size = 1)) +
  labs(x = 'Abundance', y = 'Frequency')

Exercises

  1. Calculate the number of timepoints in the training data that have non-missing observations for all eight time series.

Modelling tick abundance

We aim to build a model that can learn from all observed data and leverage the grouping structure in the observations to capture variation in seasonality and long-term trends for tick abundances among NEON sampling sites. These models can get quite complex, so it is often useful to start simple and work our way up in complexity.

Complete pooling

Our first model will assume there is a single “global” smooth function that can adequately capture periodic seasonality in tick abundances. We use a cyclic smooth (bs = 'cc') and supply a list of knots to ensure the smooth joins at the ends in the correct place. We also include a hierarchical intercept term to capture any variation in series-specific average counts. Finally, we use a Poisson observation model for initial models. This is because we’d like to capture as much of the dispersion in the data using our covariates, even though it not be possible to capture all of it. No dynamic trend component is included in this initial model, as we would like to describe as much of the variation as we can using the GAM linear predictor. Instead we use a non-wiggle smooth function of time for each plot. This should not only allow us to hone in on a useful model more quickly, it will make model diagnostics more useful to us because we avoid the perfect overfitting that can happen with flexible dynamic trend components

model0 <- mvgam(y ~   
                  # hierarchical intercepts per series (plotID)
                  s(series, bs = 're') +
                  
                  # separate smooths of year to capture changing levels across
                  # the sampling period for each series (plotID)
                  s(year, by = series, k = 4) +
                  
                  # a "global" smooth of season to capture the overall
                  # seasonal pattern
                  s(season, bs = 'cc', k = 12),
                
                # knots to ensure the seasonal smooth joins between the last 
                # week and first week of the year
                knots = list(season = c(0.5, 52.5)),
                
                # Poisson observations
                family = poisson(),
                data = tick_data$data_train,
                newdata = tick_data$data_test,
                
                # No dynamic trend yet
                trend_model = 'None')

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{ticks}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \alpha_{plot} + f(\boldsymbol{time})_t + f_{global}(\boldsymbol{epiweek})_t \\ \alpha_{plot} & \sim \text{Normal}(\mu_{plot}, \sigma_{plot}) \\ \mu_{plot} & \sim \text{Normal}(0, 1) \\ \sigma_{plot} & \sim \text{Exponential}(0.5) \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b_{time} * \beta_{time} \\ f_{global}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \end{align*}\]

The model summary is shown below, which indicates there was no evidence of problematic posterior spaces encountered by the HMC sampler

summary(model0)
## GAM formula:
## y ~ s(year, by = series, k = 4) + s(season, bs = "cc", k = 12) + 
##     s(series, bs = "re")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## None
## 
## N series:
## 8 
## 
## N timepoints:
## 208 
## 
## Status:
## Fitted using Stan 
## 
## GAM coefficient (beta) estimates:
##                                2.5%           50%       97.5% Rhat n.eff
## (Intercept)               0.4969709  0.8094075000  1.11260925 1.00  2795
## s(year):seriesBLAN_005.1  0.1633081  1.4027750000  2.44772000 1.00   686
## s(year):seriesBLAN_005.2  0.8750620  1.6987450000  2.46915450 1.00   652
## s(year):seriesBLAN_005.3 -0.6626700 -0.1252785000  0.30616007 1.00   822
## s(year):seriesBLAN_012.1 -0.2104639  0.0046403600  0.22337272 1.00  1384
## s(year):seriesBLAN_012.2 -0.1653804 -0.0049617500  0.15581930 1.00  1437
## s(year):seriesBLAN_012.3 -0.7239774 -0.2381660000  0.10931290 1.00  1396
## s(year):seriesSCBI_002.1 -0.2296308  0.0181059000  0.41774940 1.00   665
## s(year):seriesSCBI_002.2 -0.3158023 -0.0102719500  0.16059092 1.01   516
## s(year):seriesSCBI_002.3 -0.3704838 -0.0931235500  0.16538527 1.00  1887
## s(year):seriesSCBI_013.1  0.1684820  3.4294100000  5.78533275 1.00   403
## s(year):seriesSCBI_013.2 -1.4671755 -0.6609625000  0.08164576 1.00   617
## s(year):seriesSCBI_013.3 -0.1142274  1.2796800000  2.32993050 1.00   421
## s(year):seriesSERC_001.1 -0.2075342  0.0107555000  0.26109055 1.00   717
## s(year):seriesSERC_001.2 -0.1651900 -0.0033961200  0.17335480 1.00  1305
## s(year):seriesSERC_001.3 -0.3466020 -0.1068445000  0.12470052 1.00  1772
## s(year):seriesSERC_005.1 -0.2446294  0.0004505835  0.24629345 1.00  1216
## s(year):seriesSERC_005.2 -0.1523921  0.0105536000  0.24378792 1.00  1079
## s(year):seriesSERC_005.3 -0.2891965 -0.0068986350  0.26850315 1.00  1808
## s(year):seriesSERC_006.1 -0.2070458  0.0110600500  0.30245947 1.00   814
## s(year):seriesSERC_006.2 -0.3047147 -0.0153958500  0.13852260 1.00   545
## s(year):seriesSERC_006.3 -0.3825399 -0.0856099000  0.17552377 1.00  2061
## s(year):seriesSERC_012.1 -0.2317776  0.0110899500  0.34081530 1.01   575
## s(year):seriesSERC_012.2 -0.5427029 -0.0299685000  0.12483807 1.00   469
## s(year):seriesSERC_012.3 -0.3134184 -0.0708561000  0.17454350 1.00  1146
## s(season).1              -4.9247987 -2.6506500000 -0.53193908 1.01   323
## s(season).2              -3.9917680 -2.2554900000 -0.71617128 1.00  1552
## s(season).3              -1.1898650  0.0519295500  1.31137900 1.02   293
## s(season).4               2.7160087  3.8592400000  5.22352000 1.02   191
## s(season).5               3.1440633  4.2918550000  5.63295575 1.02   189
## s(season).6               1.6639433  2.7870500000  4.18051450 1.02   196
## s(season).7               0.7211883  1.9221700000  3.28920625 1.02   208
## s(season).8               0.6011475  1.8269800000  3.19172300 1.02   211
## s(season).9              -0.4832131  0.7554145000  2.11849400 1.02   239
## s(season).10             -2.5959430 -1.2095850000 -0.11488445 1.00  1516
## s(series).1              -2.6659800 -1.7685200000 -0.95614913 1.01   307
## s(series).2              -4.4130520 -3.4524400000 -2.60430050 1.02   246
## s(series).3              -3.5050365 -2.6066650000 -1.80178350 1.02   270
## s(series).4              -2.7105095 -1.8194900000 -1.02413750 1.02   337
## s(series).5              -3.2329137 -2.3108750000 -1.53203050 1.02   281
## s(series).6              -3.6371293 -2.7200550000 -1.87518075 1.02   264
## s(series).7              -3.5627540 -2.6537050000 -1.82975425 1.01   271
## s(series).8              -3.0680670 -2.1938300000 -1.42681775 1.02   273
## 
## GAM group-level estimates:
##                    2.5%       50%     97.5% Rhat n.eff
## mean(series) -3.2170062 -2.282095 -1.366656 1.02   329
## sd(series)    0.3673128  0.651237  1.506022 1.00   510
## 
## GAM observation smoothing parameter (rho) estimates:
##                                     2.5%       50%      97.5% Rhat n.eff
## s(year):seriesBLAN_005_rho  -4.267689250 -2.210250 -0.4381095 1.00   507
## s(year):seriesBLAN_0052_rho  0.410546975  3.047795  4.1282832 1.00   724
## s(year):seriesBLAN_012_rho   0.807372750  3.125425  4.1486367 1.00   936
## s(year):seriesBLAN_0122_rho  0.002776823  2.869580  4.1261930 1.01   748
## s(year):seriesSCBI_002_rho   0.126671625  2.987280  4.1505548 1.01   397
## s(year):seriesSCBI_0022_rho  0.679231250  3.166480  4.1965415 1.00   718
## s(year):seriesSCBI_013_rho  -5.385155000 -3.278665  2.1266155 1.01   209
## s(year):seriesSCBI_0132_rho -2.189508750  0.505839  3.5579585 1.00   526
## s(year):seriesSERC_001_rho   0.936888150  3.111405  4.1860500 1.00  1006
## s(year):seriesSERC_0012_rho  1.207800750  3.215520  4.1986210 1.00  1334
## s(year):seriesSERC_005_rho   0.738918525  3.118455  4.1142645 1.01   787
## s(year):seriesSERC_0052_rho  1.105079500  3.247550  4.2049797 1.01   755
## s(year):seriesSERC_006_rho   0.264355825  3.024720  4.1189072 1.00   576
## s(year):seriesSERC_0062_rho  1.123593000  3.232640  4.1566072 1.00   900
## s(year):seriesSERC_012_rho  -0.613150175  2.977110  4.1278502 1.00   429
## s(year):seriesSERC_0122_rho  1.037192250  3.234185  4.2067302 1.00  1018
## s(season)_rho                0.429231975  1.643685  2.5700705 1.00  1236
## s(series)_rho                0.137614300  3.024400  4.1400680 1.00   769
## 
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior

The model has found support for varying intercepts among series:

plot(model0, type = 're')

We can also plot these effects as intervals by passing some regex to the mcmc_plot function:

mcmc_plot(model0, 
          type = 'intervals', 
          variable = 's\\(series\\)\\.',
          regex = TRUE)

Or as areas:

mcmc_plot(model0, 
          type = 'areas', 
          variable = 's\\(series\\)\\.',
          regex = TRUE)

The model has also estimated declining trends for most of the series

plot(model0, type = 'smooths')

But as we’ve seen previously, we’re better off using targeted predictions to understand the model and think about how to make improvements. The plot_predictions function from marginaleffects can be used here to inspect conditional seasonal predictions for each series. This is a good first step to ensure the model’s predictions seem sensible

plot_predictions(model0, condition = c('season', 'series', 'series'),
                 points = 0.5, rug = TRUE) +
  facet_wrap(~series, ncol = 2, scales = 'free_y') +
  theme(legend.position = 'none')

Plotting residuals vs season

The above plot shows us that the predictions seem to match the overall periodic patterns in the observations, but it is hard to get a sense of where the model might be failing. To do so, we can inspect posterior median residuals vs season to get a sense of any unmodelled seasonal variation across time series. This can be done using a very similar call to ggplot2 as the one we used above to inspect seasonality in relative counts. Because we will be making such a plot for each model, it is better to wrap the code in a function:

plot_season_resids = function(object){
  # Take 100 posterior draws of Dunn Smyth residuals in the correct
  # order (arranged by series and then time)
  resid_points <- lapply(seq_len(100), function(j){
    data.frame(series = tick_data$data_train %>%
                 dplyr::arrange(series, time) %>%
                 dplyr::pull(series),
               resid = unlist(lapply(seq_along(object$resids), function(x){
                 object$resids[[x]][j, ]
               })))
  })
  
  # Bind arranged data to each of the 100 posterior residual draws
  all_resids <- do.call(rbind, lapply(seq_len(100), function(j){
    tick_data$data_train %>%
      dplyr::arrange(series, time) %>%
      dplyr::mutate(Site = siteID) %>%
      dplyr::bind_cols(data.frame(resid = resid_points[[j]]$resid))
  }))
  
  # Plot
  ggplot(all_resids, aes(x = season, y = resid, fill = Site)) +
    geom_jitter(shape = 16, colour = 'black', size = 1,
                alpha = 0.1) +
    geom_smooth(size = 1.5, se = FALSE, colour = 'white',
                method = 'gam', formula = y ~ s(x, k = 8)) +
    geom_smooth(size = 1.25, se = FALSE, aes(colour = Site),
                method = 'gam', formula = y ~ s(x, k = 8)) +
    facet_wrap(~plotID, ncol = 2) +
    ylim(c(-4, 4)) +
    xlim(c(9, 52)) +
    theme(axis.ticks = element_line(colour = "black", size = 1)) +
    labs(x = 'EpiWeek', y = 'DS residuals')
}

When we call this function and provide an mvgam model, it will plot 100 posterior draws of Dunn-Smyth residuals and then plot a smooth curve over the top to help us inspect any possible nonlinear relationships that have not been captured by the model:

plot_season_resids(object = model0)

The fact that we have support for systematic, nonlinear relationships between residuals and seasonal periods suggests our simple assumption of fixed seasonality among all sites is not adequate. We can supplement this plot with a similar investigation of median posterior partial residuals, which can be readily calculated by predicting from the underlying mgcv object in the model. Below is a function to plot partial residuals for 100 posterior draws against the smooth of seasonality for each series:

plot_season_partials = function(object){
  # Take 100 posterior draws of Dunn Smyth residuals in the correct
  # order (arranged by series and then time)
  resid_points <- lapply(seq_len(100), function(j){
    data.frame(series = tick_data$data_train %>%
                 dplyr::arrange(series, time) %>%
                 dplyr::pull(series),
               resid = unlist(lapply(seq_along(object$resids), function(x){
                 object$resids[[x]][j, ]
               })))
  })
  
  # Calculate link scale predictions for the observations, ignoring all
  # non-seasonal smooth terms
  all_terms <- unlist(lapply(seq_along(object$mgcv_model$smooth), function(x){
     object$mgcv_model$smooth[[x]]$label
  }))
  non_season_terms <- all_terms[!grepl('season', all_terms, fixed = TRUE)]
  
  tick_data$data_train %>%
    dplyr::bind_cols(data.frame(smooth_pred = predict(object$mgcv_model,
                                                      exclude = non_season_terms,
                                                      type = 'link'))) %>%
    dplyr::arrange(series, time) %>%
    dplyr::mutate(Site = siteID) -> smooth_preds
  
  # Calculate partial residuals for each of the 100 posterior residual draws
  all_partials <- do.call(rbind, lapply(seq_len(100), function(j){
    smooth_preds %>%
      dplyr::bind_cols(data.frame(resid = resid_points[[j]]$resid)) %>%
      dplyr::mutate(partial_resid = smooth_pred + resid)
  }))
  
  # Plot
  ggplot(all_partials, aes(x = season, y = smooth_pred, colour = Site)) +
    geom_jitter(shape = 16, colour = 'black', size = 1,
                alpha = 0.1,
                mapping = aes(x = season, y = partial_resid)) +
        geom_smooth(size = 1.5, se = FALSE, colour = 'white',
                method = 'gam', formula = y ~ s(x, bs = 'cc')) +
    geom_smooth(size = 1.25, se = FALSE, aes(colour = Site),
                method = 'gam', formula = y ~ s(x, bs = 'cc')) +
    facet_wrap(~plotID, ncol = 2) +
    theme(axis.ticks = element_line(colour = "black", size = 1)) +
    labs(x = 'EpiWeek', y = 'Median smooth & partial residuals')
}

Now we can use this function to plot partial residuals against seasonality for our first model

plot_season_partials(object = model0)

This plot shows that partial residuals concentrate above or below the smooth in certain EpiWeeks for many of the series (look at SCBI_013 in weeks 30 - 40, for example). This again suggests we need to expand the model to better represent variation in periodic seasonal patterns

Site-level partial pooling

The next model will use hierarchical smooths to try and capture seasonality. As in the first model, we retain the global cyclic smooth of season. But we now add a second set of seasonal smooths that can vary among siteIDs. This is the second level of aggregation in our data (i.e. national -> site -> plot). So we are now assuming that plots existing in the same site may share the same seasonal patterns, but these patterns may differ from the seasonal patterns in other sites. Once again, we do not include a dynamic trend component yet:

model1 <- mvgam(y ~                  
                  
                  # hierarchical intercepts per series (plotID)
                  s(series, bs = 're') +
                  
                  # separate smooths of year to capture changing levels across
                  # the sampling period for each series (plotID)
                  s(year, by = series, k = 4) +
                  
                  # a "global" smooth of season to capture the overall
                  # seasonal pattern
                  s(season, bs = 'cc', k = 12) +
                  
                  # a set of "deviation" smooths to capture site-level 
                  # variation in seasonal patterns
                  s(season, by = siteID, bs = 'cc', k = 12),
                
                # knots to ensure the seasonal smooth joins between the last 
                # week and first week of the year
                knots = list(season = c(0.5, 52.5)),
                
                # Poisson observations
                family = poisson(),
                data = tick_data$data_train,
                newdata = tick_data$data_test,
                
                # No trend
                trend_model = 'None')

Note the use of the by argument in the site-level smooths. This implies that each site-level smooth has its own unique smoothing parameter, so we are not necessarly forcing all the deviations to have the same level of smoothness. This is find for our purposes as we don’t have very many levels of siteID. But when fitting hierarchical smooths it can often make sense to use a different strategy, which will be shown in the next model below. But back to this model, which can be described mathematically as follows:

\[\begin{align*} \boldsymbol{ticks}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \alpha_{plot} + f(\boldsymbol{time})_t + f_{global}(\boldsymbol{epiweek})_t + f_{site}(\boldsymbol{epiweek})_t \\ \alpha_{plot} & \sim \text{Normal}(\mu_{plot}, \sigma_{plot}) \\ \mu_{plot} & \sim \text{Normal}(0, 1) \\ \sigma_{plot} & \sim \text{Exponential}(0.5) \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b_{time} * \beta_{time} \\ f_{global}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{site}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{site} * \beta_{site} \end{align*}\]

The model summary is shown below, which again indicates there was no evidence of problematic posterior spaces

summary(model1)
## GAM formula:
## y ~ s(year, by = series, k = 4) + s(season, bs = "cc", k = 12) + 
##     s(season, by = siteID, bs = "cc", k = 12) + s(series, bs = "re")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## None
## 
## N series:
## 8 
## 
## N timepoints:
## 208 
## 
## Status:
## Fitted using Stan 
## 
## GAM coefficient (beta) estimates:
##                                 2.5%           50%       97.5% Rhat n.eff
## (Intercept)               0.46816463  0.8342690000  1.19991625 1.00  3679
## s(year):seriesBLAN_005.1  0.31284692  1.5148700000  2.61405425 1.00  1433
## s(year):seriesBLAN_005.2  1.00204075  1.7651900000  2.55000225 1.00  2270
## s(year):seriesBLAN_005.3 -0.59540252 -0.0837274000  0.34672437 1.00  1419
## s(year):seriesBLAN_012.1 -0.23342163  0.0045733900  0.28268590 1.00  1267
## s(year):seriesBLAN_012.2 -0.20858500 -0.0050512600  0.16487362 1.00   486
## s(year):seriesBLAN_012.3 -0.69093170 -0.2178950000  0.09485138 1.00  2083
## s(year):seriesSCBI_002.1 -0.24101790  0.0170660000  0.38362900 1.00   764
## s(year):seriesSCBI_002.2 -0.28868580 -0.0090060650  0.15147402 1.01   501
## s(year):seriesSCBI_002.3 -0.38521102 -0.1046905000  0.15388615 1.00  2622
## s(year):seriesSCBI_013.1  0.08231653  3.0799000000  5.47703500 1.01   510
## s(year):seriesSCBI_013.2 -1.56818300 -0.7300160000  0.03963862 1.00   931
## s(year):seriesSCBI_013.3 -0.16917115  1.1192250000  2.25875550 1.01   538
## s(year):seriesSERC_001.1 -0.20600475  0.0112908000  0.28933447 1.00  1102
## s(year):seriesSERC_001.2 -0.18319125 -0.0014543400  0.16767207 1.00  1401
## s(year):seriesSERC_001.3 -0.34978110 -0.1040000000  0.12394342 1.00  2529
## s(year):seriesSERC_005.1 -0.26546835  0.0025161300  0.27118922 1.00   702
## s(year):seriesSERC_005.2 -0.17569080  0.0049650800  0.22039425 1.00   948
## s(year):seriesSERC_005.3 -0.27953000  0.0007236135  0.28965405 1.00  2782
## s(year):seriesSERC_006.1 -0.26421440  0.0081618500  0.41302430 1.00   696
## s(year):seriesSERC_006.2 -0.41936175 -0.0165197500  0.14838862 1.01   450
## s(year):seriesSERC_006.3 -0.38328870 -0.0730858500  0.22062960 1.00  2390
## s(year):seriesSERC_012.1 -0.23840940  0.0117052000  0.45332770 1.00   444
## s(year):seriesSERC_012.2 -0.75086905 -0.0297250000  0.11930162 1.01   233
## s(year):seriesSERC_012.3 -0.32746298 -0.0637583000  0.20810600 1.00   948
## s(season).1              -5.09815450 -2.6040150000 -0.15557000 1.01   781
## s(season).2              -4.18695000 -2.1381050000 -0.15297490 1.00   754
## s(season).3              -1.75477125  0.1483805000  2.06819725 1.01   554
## s(season).4               1.58774625  3.8122600000  5.97759500 1.02   375
## s(season).5               1.91997850  4.2243350000  6.50555850 1.02   354
## s(season).6               0.59036197  2.8522450000  5.09480525 1.02   375
## s(season).7              -0.13740212  1.8854200000  4.06821150 1.01   454
## s(season).8              -0.25128292  1.6640500000  3.70030250 1.01   559
## s(season).9              -1.20246600  0.5815295000  2.41237650 1.00   691
## s(season).10             -2.98527200 -1.2448750000  0.35118107 1.01   569
## s(season):siteIDBLAN.1   -1.97533925 -0.2693840000  1.15715800 1.01   543
## s(season):siteIDBLAN.2   -2.07025200 -0.3198105000  1.09049950 1.01   517
## s(season):siteIDBLAN.3   -1.95916225 -0.1717450000  1.42673575 1.01   525
## s(season):siteIDBLAN.4   -1.68002875  0.1823985000  2.23321125 1.01   447
## s(season):siteIDBLAN.5   -1.83213550 -0.0126395000  2.20906525 1.01   404
## s(season):siteIDBLAN.6   -1.88049125 -0.0717744000  2.02890250 1.01   438
## s(season):siteIDBLAN.7   -1.53459575  0.2511185000  2.23297950 1.01   489
## s(season):siteIDBLAN.8   -1.14864500  0.5183085000  2.40951725 1.00   494
## s(season):siteIDBLAN.9   -1.10835075  0.3429390000  2.02028950 1.01   460
## s(season):siteIDBLAN.10  -1.32936150  0.0564726000  1.51090150 1.01   468
## s(season):siteIDSCBI.1   -1.85373200 -0.2359500000  1.43243525 1.00   589
## s(season):siteIDSCBI.2   -1.95165275 -0.1495890000  1.45089950 1.00   659
## s(season):siteIDSCBI.3   -1.72972925  0.0821009000  1.75995750 1.00   627
## s(season):siteIDSCBI.4   -1.06935575  0.8675080000  2.92882200 1.01   479
## s(season):siteIDSCBI.5   -0.78455060  1.1807650000  3.45098150 1.01   404
## s(season):siteIDSCBI.6   -1.09649025  0.8339260000  2.95521425 1.00   386
## s(season):siteIDSCBI.7   -1.67175075  0.1692355000  2.19831250 1.00   480
## s(season):siteIDSCBI.8   -2.27880650 -0.4040990000  1.37069600 1.01   548
## s(season):siteIDSCBI.9   -2.37133225 -0.5803840000  1.09464700 1.01   577
## s(season):siteIDSCBI.10  -1.99624175 -0.4830710000  0.94729275 1.00   601
## s(season):siteIDSERC.1   -1.51250375  0.0544080000  1.48770950 1.00   547
## s(season):siteIDSERC.2   -1.48850425  0.0371939000  1.61020950 1.00   583
## s(season):siteIDSERC.3   -1.58423050  0.0971275500  1.71819175 1.01   522
## s(season):siteIDSERC.4   -2.12894750 -0.2360730000  1.76408425 1.02   375
## s(season):siteIDSERC.5   -2.25391625 -0.3301740000  1.84005100 1.02   356
## s(season):siteIDSERC.6   -2.17378025 -0.2907480000  1.64904125 1.01   382
## s(season):siteIDSERC.7   -2.00600175 -0.1607155000  1.60351600 1.01   474
## s(season):siteIDSERC.8   -1.50197225  0.0696709500  1.74963850 1.00   526
## s(season):siteIDSERC.9   -1.21672050  0.2474965000  1.82038800 1.01   504
## s(season):siteIDSERC.10  -1.27148425  0.1908345000  1.49927725 1.01   431
## s(series).1              -2.92897975 -1.8379600000 -0.91567108 1.00  1338
## s(series).2              -4.64112375 -3.5413950000 -2.52848075 1.00  1316
## s(series).3              -4.96812100 -3.4775050000 -2.44741250 1.01  1069
## s(series).4              -4.05881150 -2.6488850000 -1.62414400 1.01  1036
## s(series).5              -3.02263825 -2.0624600000 -1.19488500 1.00  1332
## s(series).6              -3.44185725 -2.4729300000 -1.61706725 1.00  1485
## s(series).7              -3.39435050 -2.4179650000 -1.55561225 1.00  1385
## s(series).8              -2.93639975 -1.9563550000 -1.08375900 1.00  1333
## 
## GAM group-level estimates:
##                    2.5%       50%     97.5% Rhat n.eff
## mean(series) -3.2957950 -2.330215 -1.241016    1   788
## sd(series)    0.4331456  0.807247  1.910239    1   599
## 
## GAM observation smoothing parameter (rho) estimates:
##                                   2.5%       50%      97.5% Rhat n.eff
## s(year):seriesBLAN_005_rho  -4.3135643 -2.319625 -0.7480162 1.00  2388
## s(year):seriesBLAN_0052_rho  0.8788987  3.123135  4.1272585 1.00  1180
## s(year):seriesBLAN_012_rho   0.5252752  3.049335  4.1440122 1.00   684
## s(year):seriesBLAN_0122_rho  0.3754863  2.993535  4.1010455 1.00  1255
## s(year):seriesSCBI_002_rho  -0.1789456  3.091285  4.1742723 1.00   471
## s(year):seriesSCBI_0022_rho  0.9652110  3.158585  4.2067203 1.00  1315
## s(year):seriesSCBI_013_rho  -5.3610368 -3.054240  2.7471820 1.01   337
## s(year):seriesSCBI_0132_rho -2.1208885  0.736593  3.7441307 1.00   775
## s(year):seriesSERC_001_rho   0.7262375  3.128245  4.1372590 1.00   712
## s(year):seriesSERC_0012_rho  0.9539738  3.202130  4.1796245 1.01   711
## s(year):seriesSERC_005_rho   0.2991981  3.110050  4.1284362 1.01   506
## s(year):seriesSERC_0052_rho  0.8959293  3.240420  4.1835935 1.00   981
## s(year):seriesSERC_006_rho  -0.4136433  2.946475  4.1361082 1.01   406
## s(year):seriesSERC_0062_rho  1.0363718  3.206140  4.1615402 1.00  1258
## s(year):seriesSERC_012_rho  -1.3930422  2.973650  4.1137150 1.01   219
## s(year):seriesSERC_0122_rho  1.0079603  3.246600  4.1810947 1.00  1213
## s(season)_rho                0.4670998  1.740860  2.8102270 1.00  1104
## s(season):siteIDBLAN_rho     2.0175722  3.427165  4.2405267 1.00   614
## s(season):siteIDSCBI_rho     1.6738382  3.269695  4.1651625 1.00   761
## s(season):siteIDSERC_rho     2.1646907  3.508325  4.3084687 1.00   940
## s(series)_rho                0.1619995  3.021285  4.1683842 1.00  1002
## 
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior

A plot of the smooth functions from this model now shows the “global” smooth of season as well as the site-level deviation smooths:

plot(model1, type = 'smooths')

This plot shows that site-level deviations from the “global” smooth are strongly supported for site SCBI, but less so for the other sites. We can repeat the plots from above to see if the model does a better job of capturing seasonal variation:

plot_predictions(model1, condition = c('season', 'series', 'series'),
                 points = 0.5, rug = TRUE) +
  facet_wrap(~series, ncol = 2, scales = 'free_y') +
  theme(legend.position = 'none')

As before, the conditional predictions plot looks reasonable but is harder to use for diagnosing unmodelled variation. The plot of residuals vs seasonality is more informative:

plot_season_resids(object = model1)

The trend lines are flatter for the two SCBI plots, but nonlinear patterns remain for many of the actual series. The plot of partial residuals also shows only minor improvements:

plot_season_partials(object = model1)

Plot-level partial pooling

The next model is similar to model1 as it also uses hierarchical smooths to try and capture seasonality. But this time our seasonal deviation smooths will be allowed to vary among plotIDs. This is the lowest level of aggregation in our data, meaning we are now assuming that seasonal patterns for each unique time series may differ from the seasonal patterns in the other series. This model also shows a different way of setting up hierarchical smooths, this time using the factor-smooth interaction basis. This basis forces each level of the factor variable (series, in this case) to share a smoothing penalty, which can be more efficient when we have many levels of a factor because they can learn from one another through the shared penalty. See ?mgcv::smooth.construct.fs.smooth.spec for details. These smooths can absorb variation in intercepts because they do not need to be zero-centred, so we technically could remove the hierarchical intercepts from this model. But I tend to find that this results in more problematic posterior spaces (divergences and poor convergence), so I like to keep the intercepts in the model. Once again, we do not include a dynamic trend component yet:

model2 <- mvgam(y ~                  
                  #hierarchical intercepts per series (plotID)
                  s(series, bs = 're') +
                    
                  # separate smooths of year to capture changing levels across
                  # the sampling period for each series (plotID)
                  s(year, by = series, k = 4) +
                  
                  # a "global" smooth of season to capture the overall
                  # seasonal pattern
                  s(season, bs = 'cc', k = 12) +
                  
                  # a set of "deviation" smooths to capture plot-level 
                  # variation in seasonal patterns; here we use the 
                  # fs basis to foce each plot-level deviation smooth to 
                  # use the same smoothing penalty; this is more equivalent
                  # to a 'random effect'
                  s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 12),
                
                # knots to ensure the seasonal smooth joins between the last 
                # week and first week of the year
                knots = list(season = c(0.5, 52.5)),
                
                # Poisson observations
                family = poisson(),
                data = tick_data$data_train,
                newdata = tick_data$data_test,
                
                # No trend
                trend_model = 'None')

The model can be described mathematically as follows:

\[\begin{align*} \boldsymbol{ticks}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \alpha_{plot} + f(\boldsymbol{time})_t + f_{global}(\boldsymbol{epiweek})_t + f_{plot}(\boldsymbol{epiweek})_t \\ \alpha_{plot} & \sim \text{Normal}(\mu_{plot}, \sigma_{plot}) \\ \mu_{plot} & \sim \text{Normal}(0, 1) \\ \sigma_{plot} & \sim \text{Exponential}(0.5) \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b_{time} * \beta_{time} \\ f_{global}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{plot}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{plot} * \beta_{plot} \end{align*}\]

The model summary again indicates there was no evidence of problematic posterior spaces

summary(model2)
## GAM formula:
## y ~ s(year, by = series, k = 4) + s(season, bs = "cc", k = 12) + 
##     s(season, series, bs = "fs", xt = list(bs = "cc"), k = 12) + 
##     s(series, bs = "re")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## None
## 
## N series:
## 8 
## 
## N timepoints:
## 208 
## 
## Status:
## Fitted using Stan 
## 
## GAM coefficient (beta) estimates:
##                                 2.5%           50%       97.5% Rhat n.eff
## (Intercept)               0.40998955  0.7274035000  1.04592275 1.00  2248
## s(year):seriesBLAN_005.1  0.07867341  1.4314650000  2.55170850 1.01   597
## s(year):seriesBLAN_005.2  0.93695038  1.7276950000  2.46661725 1.00  1313
## s(year):seriesBLAN_005.3 -0.77006893 -0.1158275000  0.31166540 1.00   568
## s(year):seriesBLAN_012.1 -0.24003635  0.0077180850  0.32800812 1.00   951
## s(year):seriesBLAN_012.2 -0.19971013  0.0014184150  0.19814507 1.00   447
## s(year):seriesBLAN_012.3 -0.61415390 -0.1743985000  0.15059142 1.00  1558
## s(year):seriesSCBI_002.1 -0.20521660  0.0204889000  0.37926402 1.00   767
## s(year):seriesSCBI_002.2 -0.25374507 -0.0093292750  0.17118640 1.00   726
## s(year):seriesSCBI_002.3 -0.38050957 -0.1013575000  0.15867530 1.00  1719
## s(year):seriesSCBI_013.1  0.11546978  3.0544250000  5.26578625 1.01   494
## s(year):seriesSCBI_013.2 -1.52907375 -0.6846090000  0.04222510 1.00   891
## s(year):seriesSCBI_013.3 -0.16444475  1.0934550000  2.14500025 1.01   517
## s(year):seriesSERC_001.1 -0.19163598  0.0114954000  0.29008272 1.00  1054
## s(year):seriesSERC_001.2 -0.16287305 -0.0027803450  0.17175875 1.00  1126
## s(year):seriesSERC_001.3 -0.36998345 -0.1350465000  0.11139882 1.00  2183
## s(year):seriesSERC_005.1 -0.25307870  0.0013571850  0.25817607 1.00   662
## s(year):seriesSERC_005.2 -0.15864710  0.0043028400  0.20955637 1.00   778
## s(year):seriesSERC_005.3 -0.26075870  0.0114702500  0.27925480 1.00  1889
## s(year):seriesSERC_006.1 -0.24979230  0.0115918000  0.35415422 1.01   594
## s(year):seriesSERC_006.2 -0.45464510 -0.0202708500  0.15257440 1.04    96
## s(year):seriesSERC_006.3 -0.36726378 -0.0623165000  0.21786755 1.00  1934
## s(year):seriesSERC_012.1 -0.20584263  0.0098190850  0.31868792 1.00   972
## s(year):seriesSERC_012.2 -0.51082082 -0.0262158500  0.11929512 1.01   280
## s(year):seriesSERC_012.3 -0.32940870 -0.0633670000  0.18934872 1.00  2057
## s(season).1              -4.60221675 -2.6956550000 -0.63381890 1.01   487
## s(season).2              -3.87464450 -2.2805900000 -0.75353288 1.00  1674
## s(season).3              -1.21271800  0.0248714000  1.24067175 1.01   490
## s(season).4               2.53083725  3.7712450000  5.09350200 1.01   370
## s(season).5               2.97895000  4.2161250000  5.57628075 1.01   362
## s(season).6               1.54303875  2.7942950000  4.20358175 1.01   359
## s(season).7               0.68651708  1.9464000000  3.32795350 1.01   372
## s(season).8               0.57596002  1.8305600000  3.24455125 1.01   359
## s(season).9              -0.50795605  0.8143125000  2.20261050 1.01   495
## s(season).10             -2.48854600 -1.1499950000 -0.10457021 1.00  1447
## s(season,series).1       -0.35625685 -0.0213177500  0.32560960 1.00  3775
## s(season,series).2       -0.37516792 -0.0302453500  0.28846037 1.00  2912
## s(season,series).3       -0.24920895  0.0691557000  0.45552777 1.00  1843
## s(season,series).4       -0.33706127 -0.0119775000  0.30869675 1.00  3350
## s(season,series).5       -0.43121465 -0.0632293000  0.23510938 1.00  1574
## s(season,series).6       -0.19362230  0.0989866500  0.45920127 1.00  1486
## s(season,series).7       -0.20963913  0.0758753500  0.44571172 1.00  1547
## s(season,series).8       -0.20429955  0.0879817500  0.45256872 1.00  1332
## s(season,series).9       -0.17469060 -0.0085882300  0.18955810 1.00  1158
## s(season,series).10      -0.33895928 -0.0571661000  0.16732022 1.00   962
## s(season,series).11      -1.01957400 -0.1709160000  0.39859812 1.01   651
## s(season,series).12      -0.32427835  0.0130102500  0.37134910 1.00  3200
## s(season,series).13      -0.36529302 -0.0129381500  0.35003647 1.00  3591
## s(season,series).14      -0.40531960 -0.0262112000  0.32115142 1.00  2373
## s(season,series).15      -0.33022077  0.0125682000  0.35206315 1.00  3158
## s(season,series).16      -0.33239430  0.0208528500  0.38893615 1.00  3350
## s(season,series).17      -0.35812967  0.0013094100  0.36372692 1.00  4125
## s(season,series).18      -0.42188570 -0.0714960000  0.24267325 1.00  1791
## s(season,series).19      -0.26645165  0.0402538000  0.38816580 1.00  2411
## s(season,series).20      -0.03858393  0.1709615000  0.39249917 1.00  1345
## s(season,series).21      -0.07338130  0.1757720000  0.45547862 1.00   919
## s(season,series).22      -0.22665320  0.2425925000  1.30728725 1.01   437
## s(season,series).23      -0.35390555 -0.0048757050  0.34613935 1.00  3554
## s(season,series).24      -0.32513945  0.0172933500  0.36022980 1.00  3320
## s(season,series).25      -0.35671100 -0.0026254900  0.34550377 1.00  3940
## s(season,series).26      -0.39214115 -0.0260037500  0.30754960 1.00  3418
## s(season,series).27      -0.42353157 -0.0290733500  0.27944907 1.00  2063
## s(season,series).28      -0.37535160 -0.0263802500  0.32309727 1.00  2645
## s(season,series).29      -0.31725928  0.0284070000  0.39306990 1.00  2911
## s(season,series).30      -0.44689290 -0.0746462000  0.22751442 1.00  2158
## s(season,series).31      -0.29135385 -0.0700224000  0.14093620 1.00  1777
## s(season,series).32      -0.31398208 -0.0547024000  0.18235562 1.00  1913
## s(season,series).33      -0.31655227  0.1793615000  1.18017200 1.01   674
## s(season,series).34      -0.38918152 -0.0193240000  0.33608202 1.00  2954
## s(season,series).35      -0.38376967 -0.0109425500  0.33512782 1.00  2795
## s(season,series).36      -0.34202595  0.0145714500  0.39885490 1.00  2424
## s(season,series).37      -0.36551628 -0.0053718650  0.33652560 1.00  4437
## s(season,series).38      -0.37127277 -0.0219009000  0.34121055 1.00  2868
## s(season,series).39      -0.40123223 -0.0503279500  0.29880837 1.00  3504
## s(season,series).40      -0.27503210  0.0421248000  0.38365070 1.00  3073
## s(season,series).41      -0.51305395 -0.1501530000  0.14712077 1.00  1438
## s(season,series).42      -0.34342898 -0.1284690000  0.05658857 1.00  1544
## s(season,series).43      -0.52050403 -0.2371570000 -0.01964441 1.00   890
## s(season,series).44      -0.60838550  0.0546282500  0.97552257 1.02   426
## s(season,series).45      -0.36933283  0.0010602650  0.37992592 1.00  2850
## s(season,series).46      -0.35484000 -0.0174725000  0.32487175 1.00  3336
## s(season,series).47      -0.35240970  0.0003380025  0.34383112 1.00  3349
## s(season,series).48      -0.28413435  0.0352413500  0.41988950 1.00  2914
## s(season,series).49      -0.33416015  0.0169815000  0.37067215 1.00  3427
## s(season,series).50      -0.34830900  0.0011588650  0.32977187 1.00  3678
## s(season,series).51      -0.26769382  0.0471017000  0.39919967 1.00  1880
## s(season,series).52      -0.25536167  0.0587957500  0.41237680 1.00  2220
## s(season,series).53      -0.31041887 -0.0947135000  0.09755289 1.00  1735
## s(season,series).54      -0.14542462  0.0810408500  0.34234007 1.00  1582
## s(season,series).55      -0.81843840 -0.0669486500  0.51544605 1.00  1051
## s(season,series).56      -0.32083762  0.0063197500  0.35536540 1.00  3414
## s(season,series).57      -0.34681590  0.0229721500  0.40249220 1.00  3148
## s(season,series).58      -0.39069355 -0.0054624500  0.34672620 1.00  4756
## s(season,series).59      -0.37275437 -0.0048642700  0.31977967 1.00  2283
## s(season,series).60      -0.34143137  0.0196640500  0.37702707 1.00  3265
## s(season,series).61      -0.32586020  0.0069612950  0.35320287 1.00  3252
## s(season,series).62      -0.36457197 -0.0355538000  0.26730482 1.00  2553
## s(season,series).63      -0.27226647  0.0431529000  0.41054947 1.00  3050
## s(season,series).64      -0.18359665  0.0122242500  0.22465757 1.00  1773
## s(season,series).65      -0.17001873  0.0586154000  0.32956372 1.00  2053
## s(season,series).66      -0.47604525  0.0507283500  0.84374972 1.01   896
## s(season,series).67      -0.34596057  0.0032043550  0.34838725 1.00  2659
## s(season,series).68      -0.33060555  0.0057969100  0.35154177 1.00  3249
## s(season,series).69      -0.38776042 -0.0225848500  0.30058752 1.00  2180
## s(season,series).70      -0.34259217 -0.0013502650  0.35416530 1.00  4436
## s(season,series).71      -0.34425322  0.0099933350  0.36965292 1.00  4375
## s(season,series).72      -0.38846008 -0.0366841000  0.26900858 1.00  2601
## s(season,series).73      -0.34167540 -0.0027203900  0.34817360 1.00  2563
## s(season,series).74      -0.39453387 -0.0335988500  0.25046237 1.00  2302
## s(season,series).75      -0.09950190  0.0953016000  0.30386042 1.00  1340
## s(season,series).76      -0.19329995  0.0353737000  0.26834070 1.00  1732
## s(season,series).77      -0.50755215  0.0538587500  0.83772642 1.00   875
## s(season,series).78      -0.33320730  0.0084113000  0.35139757 1.00  3905
## s(season,series).79      -0.30402360  0.0075899500  0.34517055 1.00  3649
## s(season,series).80      -0.35333555 -0.0100852000  0.31056380 1.00  3267
## s(season,series).81      -0.34137932 -0.0005845210  0.35838357 1.00  3507
## s(season,series).82      -0.37787385  0.0020199350  0.34296740 1.00  2804
## s(season,series).83      -0.33326138  0.0113174000  0.37660680 1.00  3143
## s(season,series).84      -0.35053160 -0.0218668000  0.30452312 1.00  2554
## s(season,series).85      -0.28940668  0.0209943500  0.36343387 1.00  2599
## s(season,series).86      -0.17130648  0.0174738500  0.21638345 1.00  1607
## s(season,series).87      -0.27243125 -0.0351082000  0.19372275 1.00  1570
## s(season,series).88      -0.76498190 -0.0304329000  0.64339982 1.00   697
## s(series).1              -3.05521875 -2.0554900000 -0.97690213 1.01   484
## s(series).2              -3.69495075 -2.4820150000 -1.50253750 1.00   541
## s(series).3              -3.50613750 -2.4393500000 -1.47885050 1.01   569
## s(series).4              -3.40072325 -2.3212400000 -1.40114900 1.00   508
## s(series).5              -3.18165125 -2.1343450000 -1.13635550 1.01   517
## s(series).6              -3.34071775 -2.3175450000 -1.34755575 1.00   569
## s(series).7              -3.34859475 -2.2865250000 -1.30548700 1.00   512
## s(series).8              -3.17813400 -2.1801900000 -1.23509075 1.00   503
## 
## GAM group-level estimates:
##                     2.5%       50%     97.5% Rhat n.eff
## mean(series) -3.15537450 -2.223965 -1.395747 1.00   477
## sd(series)    0.01483403  0.305158  1.004667 1.01   588
## 
## GAM observation smoothing parameter (rho) estimates:
##                                   2.5%        50%      97.5% Rhat n.eff
## s(year):seriesBLAN_005_rho  -4.1894487 -2.2597100 -0.6304378 1.00  1440
## s(year):seriesBLAN_0052_rho  0.3379950  3.0423650  4.1687652 1.00   744
## s(year):seriesBLAN_012_rho   0.3158627  3.0105550  4.1397975 1.01   494
## s(year):seriesBLAN_0122_rho  0.3657941  3.0283600  4.1223352 1.00   889
## s(year):seriesSCBI_002_rho  -0.1638110  3.0519950  4.1378757 1.01   488
## s(year):seriesSCBI_0022_rho  1.0611125  3.1844800  4.1746617 1.00  1190
## s(year):seriesSCBI_013_rho  -5.3521338 -3.0305500  2.4290537 1.01   340
## s(year):seriesSCBI_0132_rho -2.0641677  0.8275115  3.7360925 1.01   556
## s(year):seriesSERC_001_rho   0.7203857  3.1229850  4.1555885 1.01   597
## s(year):seriesSERC_0012_rho  0.8792811  3.1639500  4.1955035 1.00  1179
## s(year):seriesSERC_005_rho   0.5922350  3.0690300  4.1370322 1.01   456
## s(year):seriesSERC_0052_rho  1.3045397  3.2323600  4.2003410 1.00  1426
## s(year):seriesSERC_006_rho  -0.1142746  2.9925300  4.1343065 1.04   154
## s(year):seriesSERC_0062_rho  1.0473230  3.2196050  4.1957000 1.00  1051
## s(year):seriesSERC_012_rho  -0.4364665  2.9816500  4.1725762 1.01   278
## s(year):seriesSERC_0122_rho  1.3256547  3.2527250  4.1944985 1.00   777
## s(season)_rho                0.5461381  1.7107850  2.6579075 1.00  1506
## s(season,series)_rho        -2.3936008 -1.8172450 -1.2705300 1.02   122
## s(season,series)2_rho       -3.6722827 -1.0114800  0.1569283 1.04   160
## s(series)_rho               -0.1763533  3.0849600  4.1459552 1.01   764
## 
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 4 of 2000 iterations ended with a divergence (0.2%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior

A plot of the smooth functions from this model now shows the “global” smooth of season as well as the eight plot-level deviation smooths. When plotting fs smooths, I find it much easier to visualize patterns using the gratia::draw plot method:

gratia::draw(model2$mgcv_model, select = c('s(season)', 's(season,series)'))  

This plot shows that plot-level deviations from the “global” smooth are supported for many of the plot-level time series. But as we add more and more smooths, it becomes harder to understand what the model is trying to tell us. We can again repeat the plots from above to see if the model does a better job of capturing seasonal variation:

plot_predictions(model2, condition = c('season', 'series', 'series'),
                 points = 0.5, rug = TRUE) +
  facet_wrap(~series, ncol = 2, scales = 'free_y') +
  theme(legend.position = 'none')

The conditional smooth predictions look more realistic now than in the previous two models. This is a good sign. The plot of median residuals vs seasonality now looks better, with less evidence of remaining systematic variation:

plot_season_resids(model2)

As do the partial residuals:

plot_season_partials(object = model2)

Exercises

  1. Fit each of the above models using mgcv and interrogate them (hint, use method = 'REML for the best smoothing parameter selection results).
  2. If you were selecting among the three mgcv models using approximate hypothesis tests with a generalized likelihood ratio test (using anova.gam), which model would you end up with? (hint, use test = 'Chisq' to perform model comparisons)

Multivariate dynamics

The above steps were useful to identify a model that is complex enough to both learn from all series to identify shared seasonal patterns while also capturing how seasonality varies across series. But we would prefer to replace the smooth term of year with a time series model that will likely give better and more sensible forecasts. The models above consistently estimated declining trends in tick abundance over time for most of the series. This shouldn’t be too surprising, as large-scale climate variation, such as periods of dryer or warmer conditions, could have wide-ranging impacts on tick abundances and activity rates over many the NEON sites. We’d therefore generally expect to see some similarity in trends for many of these sampling locations. We can capitalize on these expected correlations using a dynamic factor model for the trends. In the below model, we build on model2 by adding a dynamic factor model for the long-term trends. This is used to replace the series-specific smooths of year that we were previously using to capture long-term variation. Because of the strong similarities in trends, we shouldn’t need too many latent factors. Also, because we need to forecast around 52 timepoints ahead, we should use a temporal model that has longer “memory” than say an AR1 or even an AR3 model. Instead, we will use Gaussian Processes to capture the smooth dynamic factors:

model3 <- mvgam(y ~                  
                  #hierarchical intercepts per series (plotID)
                  s(series, bs = 're') +
                  
                  # a "global" smooth of season to capture the overall
                  # seasonal pattern
                  s(season, bs = 'cc', k = 12) +
                  
                  # a set of "deviation" smooths to capture plot-level 
                  # variation in seasonal patterns; here we use the 
                  # fs basis to foce each plot-level deviation smooth to 
                  # use the same smoothing penalty; this is more equivalent
                  # to a 'random effect'
                  s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 12),
                
                # knots to ensure the seasonal smooth joins between the last 
                # week and first week of the year
                knots = list(season = c(0.5, 52.5)),
                
                # Poisson observations
                family = poisson(),
                data = tick_data$data_train,
                newdata = tick_data$data_test,
                
                # A dynamic factor trend with three latent factors
                trend_model = 'GP',
                use_lv = TRUE,
                n_lv = 3)

The model can be described mathematically as follows:

\[\begin{align*} \boldsymbol{ticks}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \alpha_{plot} + f(\boldsymbol{time})_t + f_{global}(\boldsymbol{epiweek})_t + f_{plot}(\boldsymbol{epiweek})_t + z_{t} \\ \alpha_{plot} & \sim \text{Normal}(\mu_{plot}, \sigma_{plot}) \\ \mu_{plot} & \sim \text{Normal}(0, 1) \\ \sigma_{plot} & \sim \text{Exponential}(0.5) \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b_{time} * \beta_{time} \\ f_{global}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{plot}(\boldsymbol{epiweek}) & = \sum_{k=1}^{K}b_{plot} * \beta_{plot} \\ z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \\ \alpha & \sim \text{StudentT}(3, 0, 2.5)[0, ] \\ \rho & \sim \text{InverseGamma}(1.5, 9) \end{align*}\]

The model summary:

summary(model3)
## GAM formula:
## y ~ s(season, bs = "cc", k = 12) + s(season, series, bs = "fs", 
##     xt = list(bs = "cc"), k = 12) + s(series, bs = "re")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## GP
## 
## N latent factors:
## 3 
## 
## N series:
## 8 
## 
## N timepoints:
## 208 
## 
## Status:
## Fitted using Stan 
## 
## GAM coefficient (beta) estimates:
##                            2.5%           50%       97.5% Rhat n.eff
## (Intercept)          0.58615835  0.7871810000  0.98005655 1.00  2412
## s(season).1         -4.70780875 -2.6491600000 -0.71995933 1.01   612
## s(season).2         -4.08723875 -2.3046350000 -0.82180383 1.00  1838
## s(season).3         -1.23693300 -0.0288453000  1.15183500 1.01   690
## s(season).4          2.50099650  3.7025000000  4.93778475 1.01   495
## s(season).5          2.97805825  4.1819600000  5.43179375 1.01   484
## s(season).6          1.61185675  2.8028200000  4.14082525 1.01   481
## s(season).7          0.72855195  1.9421900000  3.28434750 1.01   498
## s(season).8          0.59228810  1.7979350000  3.16813000 1.01   519
## s(season).9         -0.43116745  0.8396805000  2.22339125 1.00   610
## s(season).10        -2.34841500 -1.0962800000 -0.03927176 1.00  1305
## s(season,series).1  -0.40484657 -0.0227245000  0.32998975 1.00  3730
## s(season,series).2  -0.37339678 -0.0284949500  0.29636547 1.00  2680
## s(season,series).3  -0.28497250  0.0716485500  0.46426455 1.00  1382
## s(season,series).4  -0.33389133 -0.0054152450  0.31051877 1.00  2739
## s(season,series).5  -0.41882022 -0.0676727500  0.23597985 1.00  1470
## s(season,series).6  -0.23248205  0.0977878000  0.51769700 1.00   844
## s(season,series).7  -0.20434683  0.0801303500  0.42245152 1.00  1301
## s(season,series).8  -0.21118710  0.0862054000  0.44363740 1.00  1689
## s(season,series).9  -0.19790232 -0.0265665500  0.17609435 1.00  1213
## s(season,series).10 -0.33942957 -0.0628726500  0.16629870 1.00  1027
## s(season,series).11 -1.17353575 -0.1666415000  0.41083767 1.00   782
## s(season,series).12 -0.33408140  0.0077858500  0.38840062 1.00  2917
## s(season,series).13 -0.36844108 -0.0123762000  0.31872317 1.00  2992
## s(season,series).14 -0.35494562 -0.0209550500  0.29789735 1.00  3140
## s(season,series).15 -0.29472863  0.0129395000  0.36664697 1.00  3714
## s(season,series).16 -0.29243025  0.0231898000  0.36258840 1.00  3393
## s(season,series).17 -0.33242223  0.0009086120  0.34731560 1.00  3313
## s(season,series).18 -0.42762648 -0.0670722000  0.23556525 1.00  1999
## s(season,series).19 -0.27404662  0.0474219000  0.38766167 1.00  2514
## s(season,series).20 -0.04490569  0.1689710000  0.42165808 1.00  1089
## s(season,series).21 -0.07876577  0.1719815000  0.46205333 1.00   736
## s(season,series).22 -0.28059675  0.2352315000  1.41161175 1.01   325
## s(season,series).23 -0.37770160 -0.0043807750  0.35000060 1.00  2596
## s(season,series).24 -0.30351892  0.0273047000  0.34122432 1.00  3866
## s(season,series).25 -0.34918528 -0.0081251600  0.32237470 1.00  3067
## s(season,series).26 -0.39477267 -0.0277249500  0.30817622 1.00  3145
## s(season,series).27 -0.44560973 -0.0315534500  0.33244722 1.00  1458
## s(season,series).28 -0.38417785 -0.0304177000  0.29947650 1.00  2884
## s(season,series).29 -0.29816867  0.0262799000  0.35836490 1.00  2320
## s(season,series).30 -0.44866405 -0.0784662000  0.21855832 1.00  1077
## s(season,series).31 -0.28468605 -0.0668353500  0.15019447 1.00  1725
## s(season,series).32 -0.35142690 -0.0623228500  0.18185247 1.00  1782
## s(season,series).33 -0.39484507  0.1777330000  1.38788800 1.01   325
## s(season,series).34 -0.35302807 -0.0084182350  0.32252642 1.00  3286
## s(season,series).35 -0.39158130 -0.0151093500  0.34316805 1.00  4473
## s(season,series).36 -0.30114630  0.0208318000  0.36938590 1.00  2954
## s(season,series).37 -0.37012567 -0.0006568435  0.36840737 1.00  4387
## s(season,series).38 -0.34699923 -0.0224966000  0.30781548 1.00  3255
## s(season,series).39 -0.37931035 -0.0404031500  0.27986810 1.00  3323
## s(season,series).40 -0.25714217  0.0360393000  0.35703805 1.00  2307
## s(season,series).41 -0.50642335 -0.1440080000  0.13827242 1.00  1387
## s(season,series).42 -0.32748610 -0.1208015000  0.06909645 1.00  1450
## s(season,series).43 -0.50814800 -0.2259700000 -0.00784427 1.00  1024
## s(season,series).44 -0.55914110  0.0393400000  0.94284437 1.00   561
## s(season,series).45 -0.33495760 -0.0021194600  0.33202757 1.00  3305
## s(season,series).46 -0.38389808 -0.0178334500  0.32823062 1.00  3937
## s(season,series).47 -0.36242510 -0.0020264000  0.36494892 1.00  2061
## s(season,series).48 -0.29568030  0.0322039500  0.42753682 1.00  2937
## s(season,series).49 -0.30158602  0.0109151500  0.34931530 1.00  3275
## s(season,series).50 -0.35833185 -0.0021678000  0.34291225 1.00  3570
## s(season,series).51 -0.24205235  0.0576910500  0.39149652 1.00  2610
## s(season,series).52 -0.26262917  0.0544315500  0.37383960 1.00  3248
## s(season,series).53 -0.29257898 -0.0937942500  0.09423598 1.00  1662
## s(season,series).54 -0.13261885  0.0822211500  0.34296775 1.00  1619
## s(season,series).55 -0.83473280 -0.0639224500  0.68261352 1.00   670
## s(season,series).56 -0.37272170  0.0116546000  0.40275452 1.00  2453
## s(season,series).57 -0.30771595  0.0242445500  0.38603795 1.00  2733
## s(season,series).58 -0.33899460 -0.0085738050  0.34307280 1.00  2731
## s(season,series).59 -0.42061622 -0.0109843000  0.32486660 1.00  1755
## s(season,series).60 -0.33590205  0.0097444050  0.35293267 1.00  3305
## s(season,series).61 -0.32685515  0.0120693500  0.36376505 1.00  2872
## s(season,series).62 -0.37161407 -0.0302758000  0.28821612 1.00  2835
## s(season,series).63 -0.26941155  0.0452242000  0.37829415 1.00  2879
## s(season,series).64 -0.19831092  0.0066457000  0.21168675 1.00  1666
## s(season,series).65 -0.18904392  0.0506976500  0.30441055 1.00  2456
## s(season,series).66 -0.56197030  0.0401222000  0.90268142 1.00   658
## s(season,series).67 -0.34938663  0.0057726200  0.33612637 1.00  3257
## s(season,series).68 -0.32289695  0.0044573900  0.34315400 1.00  3325
## s(season,series).69 -0.37880937 -0.0168224500  0.31032227 1.00  3951
## s(season,series).70 -0.36082400 -0.0035328300  0.35110730 1.00  3496
## s(season,series).71 -0.34553782  0.0211764000  0.39129060 1.00  3154
## s(season,series).72 -0.39676257 -0.0396560000  0.28488052 1.00  1728
## s(season,series).73 -0.33820358 -0.0128438000  0.29757500 1.00  2808
## s(season,series).74 -0.38346663 -0.0330111500  0.26606905 1.00  1859
## s(season,series).75 -0.09678753  0.1009990000  0.31027447 1.00  1572
## s(season,series).76 -0.18395082  0.0480632500  0.30334073 1.00  2203
## s(season,series).77 -0.55206640  0.0492318000  1.03317600 1.00   787
## s(season,series).78 -0.35822728  0.0082043950  0.36750965 1.00  2628
## s(season,series).79 -0.35717260  0.0087550450  0.38582285 1.00  3343
## s(season,series).80 -0.34773597 -0.0180159000  0.34610005 1.00  3449
## s(season,series).81 -0.35377877 -0.0029722500  0.35069215 1.00  2839
## s(season,series).82 -0.35028570  0.0006003630  0.32511420 1.00  2333
## s(season,series).83 -0.33935805  0.0163939500  0.35389387 1.00  1881
## s(season,series).84 -0.36737085 -0.0204510000  0.29307537 1.00  2314
## s(season,series).85 -0.26522020  0.0307177000  0.35584490 1.00  2626
## s(season,series).86 -0.16456792  0.0281232000  0.20999625 1.00  1716
## s(season,series).87 -0.25712770 -0.0208150000  0.21612155 1.00  1755
## s(season,series).88 -0.79127500 -0.0267223500  0.71640862 1.00  1063
## s(series).1         -3.11595250 -2.1198650000 -1.04202675 1.00   757
## s(series).2         -3.76937000 -2.5714650000 -1.48451125 1.01   442
## s(series).3         -3.74754875 -2.5509100000 -1.52925400 1.01   466
## s(series).4         -3.57947100 -2.3996950000 -1.40331650 1.01   569
## s(series).5         -3.17097950 -2.2198200000 -1.24990475 1.01   671
## s(series).6         -3.31621150 -2.3853900000 -1.34607725 1.00   715
## s(series).7         -3.45084850 -2.3957100000 -1.43272000 1.01   704
## s(series).8         -3.24022225 -2.2694700000 -1.37358575 1.00   687
## 
## GAM group-level estimates:
##                     2.5%        50%     97.5% Rhat n.eff
## mean(series) -3.15153175 -2.3041850 -1.371777 1.01   545
## sd(series)    0.01156674  0.3338975  1.222821 1.01   251
## 
## GAM observation smoothing parameter (rho) estimates:
##                             2.5%       50%      97.5% Rhat n.eff
## s(season)_rho          0.5714946  1.753395  2.6774647 1.00  1531
## s(season,series)_rho  -2.3770277 -1.823235 -1.2795680 1.04   114
## s(season,series)2_rho -3.4591283 -0.985094  0.2480292 1.02   153
## s(series)_rho          0.1967214  3.047490  4.0951555 1.00   778
## 
## Latent trend length scale (rho) estimates:
##               2.5%      50%    97.5% Rhat n.eff
## rho_gp[1] 1.799521 6.655455 24.78586 1.00  1554
## rho_gp[2] 1.304594 5.540665 34.79403 1.01   336
## rho_gp[3] 1.416699 6.240680 27.86827 1.00  1025
## 
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhats above 1.05 found for 11 parameters
## *Diagnose further to investigate why the chains have not mixed
## 10 of 2000 iterations ended with a divergence (0.5%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior

We can plot the smooth effects again to see that they haven’t changed much from the inclusion of the dynamic factor Gaussian Process trends

gratia::draw(model3$mgcv_model, select = c('s(season)', 's(season,series)'))  

Examining dynamic factors

A useful plot when working with dynamic factor models is to plot the actual latent factors. This plot shows that the three factors are capturing different temporal patterns

plot_mvgam_factors(model3)

##         Contribution
## Factor1    0.2940309
## Factor2    0.3875629
## Factor3    0.3184062

Each time series has a set of coefficients (or weights) that control how it depends on these latent factors. It is this combination of weights, which must be estimated by the model, that controls the overall dynamics of the time series. The weights have some restrictions to ensure the model is identifiable, but we won’t go into those details here. You can view the posterior weights using bayesplot:

mcmc_hist(model3$model_output, regex_pars = 'lv_coefs')

These loading weights can be used to induce correlations in the dynamic trends of our time series (series with more similar loading weights will have more strongly correlated dynamic trends). We can take advantage of this property to inspect the induced correlation matrix:

correlations <- lv_correlations(object = model3)

The mean posterior correlation matrix can be plotted using ggplot2 (note, this is the only time in the workshop when I’ll use tidyr functionality, so if you do not have this installed then don’t worry about it!)

mean_correlations <- correlations$mean_correlations
mean_correlations[upper.tri(mean_correlations)] <- NA
mean_correlations <- data.frame(mean_correlations)
mean_correlations %>%
  dplyr::add_rownames("series1") %>%
  tidyr::pivot_longer(-c(series1), 
                      names_to = "series2", 
                      values_to = "Correlation") -> mean_correlations
ggplot(mean_correlations,
       aes(x = series1, y = series2)) + geom_tile(aes(fill = Correlation)) +
  scale_fill_gradient2(low="darkred", mid="white", high="darkblue",
                       midpoint = 0,
                       breaks = seq(-1,1,length.out = 5),
                       limits = c(-1, 1),
                       name = 'Trend\ncorrelation') + labs(x = '', y = '') +
  theme(axis.text.x = element_text(angle = 45, hjust=1))

This plot shows there are is a lot of structure that is being captured with our dimension-reduced dynamic factor model. For example, series SERC_012 (series 8) and BLAN_005 (series 1) show a fairly strong negative correlation:

plot(model3, type = 'trend', series = 1)

plot(model3, type = 'trend', series = 8)

While two of the SERC plots (SERC_006 and SERC_012) show a fairly strong positive correlation:

plot(model3, type = 'trend', series = 7)

plot(model3, type = 'trend', series = 8)

Because of these positively correlated trends, the forecasts for these two series will be similar:

plot(model3, type = 'forecast', series = 7)

plot(model3, type = 'forecast', series = 8)

This is an extremely useful and practical property of dynamic factor models, the fact that we can learn complex correlation structures that are informed by all our data, giving us reasonable forecasts that will respect those correlation structures.

Comparing forecasts

But a key question is whether this more complex dynamic factor model can produce better forecasts than the much simpler model that used smooth functions of year. Here we will calculate the out of sample DRPS for each series and then simply take the sums as a measure of forecast performance. We don’t have much choice for computing more complex multivariate scores in this example because many of the individual NEON plots were not sampled at the same time in the out of sample period. First, we use the forecast.mvgam and score.mvgam_forecast functions to compute the out of sample DRPS for the last three models:

model1_scores <- score(forecast(model1), score = 'drps')
model2_scores <- score(forecast(model2), score = 'drps')
model3_scores <- score(forecast(model3), score = 'drps')

Now we simply loop across each series, take the sum of all non-NA scores and then take an overall sum. The model that achieves the lowest score is the best in this case:

sum(unlist(lapply(1:8, function(x){
  sum(model1_scores[[x]]$score, na.rm = TRUE)
})))
## [1] 101.398
sum(unlist(lapply(1:8, function(x){
  sum(model2_scores[[x]]$score, na.rm = TRUE)
})))
## [1] 97.2371
sum(unlist(lapply(1:8, function(x){
  sum(model3_scores[[x]]$score, na.rm = TRUE)
})))
## [1] 95.59081

The dynamic factor model (model3) gives the best out of sample forecast performance in this case, which should not be surprising given our understanding of how problematic it can sometimes be to use spline extrapolation for producing forecasts

Exercises

  1. Fit a version of model3 that uses Negative Binomial observations in place of the Poisson. Look at the estimates for the overdispserion parameters *\(\phi\) and comment on what they are capturing
  2. Fit another version of the model but this time constrain the GP length scale parameters \(\rho\) using a \(Normal(12, 2)\) prior. Do any of your conclusions change?

State-Space VAR models

A final example in this tutorial will use a different dataset to showcase one of mvgam’s more advanced features: the ability to fit dynamic models in a State-Space representation. We have not gone into too much detail about how these models work (but see this outstanding resource from E. E. Holmes, M. D. Scheuerell, and E. J. Ward for many useful details, along with this nice lecture by E. E. Holmes).

Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process


Briefly, these models allow us to separately make inferences about the underlying dynamic process model that we are interested in (i.e. the evolution of a time series or a collection of time series) and the observation model (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariate impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of mvgam is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes. I am not aware of any other packages that can easily do this, but of course there may be some.

Lake Washington plankton data

The data we will use to illustrate how we can fit State-Space models in mvgam are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the MARSS package and can be downloaed using the following:

load(url('https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda'))

We will work with five different groups of plankton:

outcomes <- c('Greens', 'Bluegreens', 'Diatoms', 'Unicells', 'Other.algae')

As usual, preparing the data into the correct format for mvgam modelling takes a little bit of wrangling in dplyr:

# loop across each plankton group to create the long datframe
plankton_data <- do.call(rbind, lapply(outcomes, function(x){
  
  # create a group-specific dataframe with counts labelled 'y'
  # and the group name in the 'series' variable
  data.frame(year = lakeWAplanktonTrans[, 'Year'],
             month = lakeWAplanktonTrans[, 'Month'],
             y = lakeWAplanktonTrans[, x],
             series = x,
             temp = lakeWAplanktonTrans[, 'Temp'])})) %>%
  
  # change the 'series' label to a factor
  dplyr::mutate(series = factor(series)) %>%
  
  # filter to only include some years in the data
  dplyr::filter(year >= 1965 & year < 1975) %>%
  dplyr::arrange(year, month) %>%
  dplyr::group_by(series) %>%
  
  # z-score the counts so they are approximately standard normal
  dplyr::mutate(y = as.vector(scale(y))) %>%
  
  # add the time indicator
  dplyr::mutate(time = dplyr::row_number()) %>%
  dplyr::ungroup()

Inspect the data structure

head(plankton_data)
## # A tibble: 6 × 6
##    year month       y series       temp  time
##   <dbl> <dbl>   <dbl> <fct>       <dbl> <int>
## 1  1965     1 -0.542  Greens      -1.23     1
## 2  1965     1 -0.344  Bluegreens  -1.23     1
## 3  1965     1 -0.0768 Diatoms     -1.23     1
## 4  1965     1 -1.52   Unicells    -1.23     1
## 5  1965     1 -0.491  Other.algae -1.23     1
## 6  1965     2 NA      Greens      -1.32     2
dplyr::glimpse(plankton_data)
## Rows: 600
## Columns: 6
## $ year   <dbl> 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196…
## $ month  <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …
## $ y      <dbl> -0.54241769, -0.34410776, -0.07684901, -1.52243490, -0.49055442…
## $ series <fct> Greens, Bluegreens, Diatoms, Unicells, Other.algae, Greens, Blu…
## $ temp   <dbl> -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.…
## $ time   <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …

Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables)

plot_mvgam_series(data = plankton_data, series = 'all')

As usual, check the data for NAs:

image(is.na(t(plankton_data)), axes = F,
      col = c('grey80', 'darkred'))
axis(3, at = seq(0,1, len = NCOL(plankton_data)), 
     labels = colnames(plankton_data))

We have some missing observations, but of course this isn’t an issue for modelling in mvgam. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month:

plankton_data %>%
  dplyr::filter(series == 'Other.algae') %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y), col = 'white',
            size = 1.3) +
  geom_line(aes(y = y), col = 'darkred',
            size = 1.1) +
  ylab('z-score') +
  xlab('Time') +
  ggtitle('Temperature (black) vs Other algae (red)')

plankton_data %>%
  dplyr::filter(series == 'Diatoms') %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y), col = 'white',
            size = 1.3) +
  geom_line(aes(y = y), col = 'darkred',
            size = 1.1) +
  ylab('z-score') +
  xlab('Time') +
  ggtitle('Temperature (black) vs Diatoms (red)')

plankton_data %>%
  dplyr::filter(series == 'Greens') %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y), col = 'white',
            size = 1.3) +
  geom_line(aes(y = y), col = 'darkred',
            size = 1.1) +
  ylab('z-score') +
  xlab('Time') +
  ggtitle('Temperature (black) vs Greens (red)')

We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits:

plankton_train <- plankton_data %>%
  dplyr::filter(time <= 112)
plankton_test <- plankton_data %>%
  dplyr::filter(time > 112)

Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of this workshop).

Capturing seasonality

First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model builds from our hierarchical GAMs above by introducing hierarchical multidimensional smooths. It includes a “global” tensor product of the month and temp variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group’s seasonality differs from the overal “global” seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0.

notrend_mod <- mvgam(y ~ 
                       # tensor of temp and month to capture
                       # "global" seasonality
                       te(temp, month, k = c(4, 4)) +
                       
                       # series-specific deviation tensor products
                       te(temp, month, k = c(4, 4), by = series),
                     family = gaussian(),
                     data = plankton_train,
                     newdata = plankton_test,
                     trend_model = 'None')

The “global” tensor product smooth function can be quickly visualized using gratia:

gratia::draw(notrend_mod$mgcv_model, select = 1)

We can then plot the deviation smooths for each algal group to see how they vary from the “global” pattern:

gratia::draw(notrend_mod$mgcv_model, select = 2)

gratia::draw(notrend_mod$mgcv_model, select = 3)

gratia::draw(notrend_mod$mgcv_model, select = 4)

gratia::draw(notrend_mod$mgcv_model, select = 5)

gratia::draw(notrend_mod$mgcv_model, select = 6)

These multidimensional smooths have done a good job of capturing the seasonal variation in our observations:

plot(notrend_mod, type = 'forecast', series = 1)

plot(notrend_mod, type = 'forecast', series = 2)

plot(notrend_mod, type = 'forecast', series = 3)

plot(notrend_mod, type = 'forecast', series = 4)

plot(notrend_mod, type = 'forecast', series = 5)

Multiseries dynamics

The basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for each series:

plot(notrend_mod, type = 'residuals', series = 1)

plot(notrend_mod, type = 'residuals', series = 2)

plot(notrend_mod, type = 'residuals', series = 3)

plot(notrend_mod, type = 'residuals', series = 4)

plot(notrend_mod, type = 'residuals', series = 5)

Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independnetly from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows:

\[\begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = \alpha + process_t \\ \sigma_{obs} & \sim \text{Uniform}(0.15, 1) \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = VAR * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \\ VAR & \sim \text{Normal}(0, 1) \\ \Sigma_{process} & = \text{diag}(\sigma_{process}) * \text{R} * \text{diag}(\sigma_{process}) \\ \text{R} & \sim \text{LKJcorr}(2) \end{align*}\]

Here you can see that we assume independent observation processes (there is no covariance structure in the observation errors \(\sigma_{obs}\)) but there is a lot going on in the underlying process model. This component has a Vector Autoregressive part (where the process mean at time \(t\) \((\mu_{process[t]})\)) is a vector that evolves as a function of where the vector-valued process model was at time \(t-1\). The \(VAR\) matrix captures these dynamics with self-dependencies on the diagonal and possibly assymetric cross-dependencies on the off-diagonals. The contemporaneous process errors are captured by \(\Sigma_{process}\), which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using Stan’s \(LKJcorr\) distribution to place a prior on the strength of inter-species correlations).


Ok that was a lot to take in. Let’s fit some models to try and inspect what is going on and what they assume. But first, we need to update mvgam’s default priors for the observation errors. By default, mvgam uses a fairly wide Student-T prior on this parameter because the package doesn’t know what range the observations will be on. But our observations are z-scored and so we do not expect very large observation errors. However, we also do not expect very small observation errors. And I can tell you that if the sampler starts exploring these values, wacky things can happen! So let’s update the prior for this parameter. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in mvgam:

priors <- get_mvgam_priors(
  # observation formula, which just uses an intercept
  y ~ 1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend),
  
  # VAR1 model with uncorrelated process errors
  trend_model = 'VAR1',
  family = gaussian(),
  data = plankton_train)

# update the sigma_obs prior so that it avoids very small
# and very large values that are nonsensical
priors$prior[10] <- "sigma_obs ~ uniform(0.15, 1);"
priors$new_lowerbound[10] <- 0.15
priors$new_upperbound[10] <- 1

Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated

var_mod <- mvgam(  
  # observation formula, which just uses an intercept
  y ~ 1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend),
  
  # VAR1 model with uncorrelated process errors
  trend_model = 'VAR1',
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  
  # include the updated priors
  priors = priors)

Inspecting SS models

This model’s summary is a bit different to the ones we have seen previously. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called A in the output) and for the process errors (Sigma)

summary(var_mod)
## GAM observation formula:
## y ~ 1
## 
## GAM process formula:
## ~te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), 
##     by = trend)
## 
## Family:
## gaussian
## 
## Link function:
## identity
## 
## Trend model:
## VAR1
## 
## N process models:
## 5 
## 
## N series:
## 5 
## 
## N timepoints:
## 112 
## 
## Status:
## Fitted using Stan 
## 
## Observation error parameter estimates:
##                   2.5%       50%     97.5% Rhat n.eff
## sigma_obs[1] 0.1521387 0.2015990 0.2951338 1.01   202
## sigma_obs[2] 0.1687413 0.3382920 0.5261607 1.06    73
## sigma_obs[3] 0.7115792 0.8662465 0.9871557 1.04   105
## sigma_obs[4] 0.1625956 0.3100540 0.4585531 1.08    85
## sigma_obs[5] 0.2230158 0.4207145 0.5454296 1.04    74
## 
## GAM observation model coefficient (beta) estimates:
##                  2.5%        50%      97.5% Rhat n.eff
## (Intercept) -0.119489 -0.0336624 0.05745112 1.03   170
## 
## Process model VAR parameter estimates:
##               2.5%          50%      97.5% Rhat n.eff
## A[1,1]  0.08312810  0.414188000 0.69449455 1.01   312
## A[1,2] -1.03830525  0.206556500 1.59460250 1.24    14
## A[1,3] -0.52313318 -0.210351000 0.01803674 1.02   160
## A[1,4] -0.17626083  0.095510050 0.37896035 1.10    29
## A[1,5] -0.05344159  0.223701000 0.67536777 1.01   216
## A[2,1] -0.40406930 -0.044557400 0.20983435 1.06    86
## A[2,2] -0.18271000  0.669338000 0.98275420 1.19    27
## A[2,3] -0.21883420 -0.009720635 0.13086200 1.04   259
## A[2,4] -0.10599915  0.031593500 0.31218502 1.12    39
## A[2,5] -0.18751040  0.011747600 0.26797827 1.04   136
## A[3,1] -0.46703572 -0.166129000 0.01545619 1.02   178
## A[3,2] -0.40260890  0.029716350 0.51021377 1.02   361
## A[3,3]  0.55267030  0.757782500 0.89061135 1.01   298
## A[3,4] -0.05322876  0.067225050 0.23029365 1.02   385
## A[3,5] -0.03570706  0.143598000 0.41622580 1.01   228
## A[4,1] -0.62413893 -0.207627500 0.08906170 1.03   181
## A[4,2] -0.80433732  0.265618500 1.61584800 1.23    16
## A[4,3] -0.48510960 -0.152662000 0.07304780 1.03   124
## A[4,4]  0.44623855  0.721780500 0.94635962 1.09    34
## A[4,5] -0.11214335  0.199946500 0.71323992 1.03   153
## A[5,1] -0.40596303 -0.102042000 0.14250137 1.01   264
## A[5,2] -0.71013177  0.114929000 0.92595892 1.17    17
## A[5,3] -0.10782992  0.063915550 0.29544830 1.02   246
## A[5,4] -0.22606782 -0.038671500 0.14094652 1.04    95
## A[5,5]  0.43603985  0.737715500 0.97363762 1.03   121
## 
## Process error parameter estimates:
##                   2.5%        50%     97.5% Rhat n.eff
## Sigma[1,1] 0.047399995 0.15732950 0.3260789 1.06    58
## Sigma[1,2] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[1,3] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[1,4] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[1,5] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[2,1] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[2,2] 0.007679224 0.04720345 0.2191654 1.10    35
## Sigma[2,3] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[2,4] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[2,5] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[3,1] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[3,2] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[3,3] 0.072888115 0.12608800 0.1990677 1.01   347
## Sigma[3,4] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[3,5] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[4,1] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[4,2] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[4,3] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[4,4] 0.083184102 0.20876500 0.3652301 1.08    60
## Sigma[4,5] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[5,1] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[5,2] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[5,3] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[5,4] 0.000000000 0.00000000 0.0000000  NaN   NaN
## Sigma[5,5] 0.044861035 0.11678550 0.2804763 1.05    59
## 
## GAM process model coefficient (beta) estimates:
##                                            2.5%           50%       97.5% Rhat
## te(temp,month).1_trend              -0.71099900 -0.2130370000  0.31335565 1.01
## te(temp,month).2_trend              -0.75658025 -0.0196095000  0.87432605 1.01
## te(temp,month).3_trend              -1.05740750 -0.3301185000  0.32748397 1.00
## te(temp,month).4_trend              -0.59469717 -0.2158295000  0.14640095 1.01
## te(temp,month).5_trend              -0.07129349  0.3039930000  0.75814242 1.01
## te(temp,month).6_trend              -0.37089268  0.0569641500  0.61284750 1.01
## te(temp,month).7_trend              -0.54614723 -0.2256700000  0.09005084 1.01
## te(temp,month).8_trend              -0.80821795 -0.1850125000  0.58608922 1.01
## te(temp,month).9_trend               0.08200151  0.4036360000  0.78820735 1.02
## te(temp,month).10_trend              0.21128825  0.6678310000  1.17684975 1.02
## te(temp,month).11_trend             -0.77189685 -0.1917800000  0.38488095 1.02
## te(temp,month).12_trend             -0.96894702 -0.2306785000  1.01376775 1.01
## te(temp,month).13_trend             -0.41728790  0.1354720000  0.76465117 1.00
## te(temp,month).14_trend              0.13327173  0.5725980000  1.01980375 1.01
## te(temp,month).15_trend             -1.20861200 -0.0851215000  1.10013375 1.02
## te(temp,month):trendtrend1.1_trend  -0.24302800  0.7181530000  1.52791950 1.01
## te(temp,month):trendtrend1.2_trend  -0.91311937  0.4879080000  1.90402550 1.00
## te(temp,month):trendtrend1.3_trend  -1.27515850 -0.3106690000  0.78940365 1.01
## te(temp,month):trendtrend1.4_trend  -0.54733610 -0.0125492500  0.68209162 1.01
## te(temp,month):trendtrend1.5_trend   0.52137713  1.2167900000  1.82681925 1.01
## te(temp,month):trendtrend1.6_trend  -0.78710398  0.0702674500  0.86538600 1.00
## te(temp,month):trendtrend1.7_trend  -0.71333865 -0.2613415000  0.19786130 1.01
## te(temp,month):trendtrend1.8_trend  -0.86333502 -0.0581366500  1.37108075 1.01
## te(temp,month):trendtrend1.9_trend   0.42158565  1.1330300000  1.76216250 1.02
## te(temp,month):trendtrend1.10_trend -1.09500375 -0.4555880000  0.24531112 1.01
## te(temp,month):trendtrend1.11_trend -1.23044200 -0.0386433500  0.64011422 1.01
## te(temp,month):trendtrend1.12_trend -1.20292275 -0.0954618500  1.75413800 1.01
## te(temp,month):trendtrend1.13_trend -0.79736965  0.4748545000  1.62208200 1.01
## te(temp,month):trendtrend1.14_trend -1.38584950 -0.7099340000 -0.04923621 1.02
## te(temp,month):trendtrend1.15_trend -1.99422350  0.4429710000  1.80239300 1.01
## te(temp,month):trendtrend2.1_trend  -0.39262825  0.1519265000  0.71653622 1.01
## te(temp,month):trendtrend2.2_trend  -0.45736985  0.1046660000  0.69670248 1.00
## te(temp,month):trendtrend2.3_trend  -0.57210860  0.0223266500  0.64436755 1.00
## te(temp,month):trendtrend2.4_trend  -0.64796827 -0.1443180000  0.25648672 1.00
## te(temp,month):trendtrend2.5_trend  -0.44956640  0.0072609300  0.51147727 1.02
## te(temp,month):trendtrend2.6_trend  -0.35361955  0.0186700500  0.38417730 1.00
## te(temp,month):trendtrend2.7_trend  -0.52096785 -0.1038940000  0.25983440 1.01
## te(temp,month):trendtrend2.8_trend  -0.69157295 -0.0915901500  0.45457920 1.00
## te(temp,month):trendtrend2.9_trend  -0.38614895  0.0025081800  0.40747242 1.01
## te(temp,month):trendtrend2.10_trend -0.56634350 -0.0495205000  0.44149625 1.01
## te(temp,month):trendtrend2.11_trend -0.57339845 -0.0783913500  0.37276495 1.00
## te(temp,month):trendtrend2.12_trend -0.69388478  0.0290373500  0.69028362 1.01
## te(temp,month):trendtrend2.13_trend -0.29178900  0.1251000000  0.66409720 1.00
## te(temp,month):trendtrend2.14_trend -0.24905188  0.1402650000  0.63223962 1.00
## te(temp,month):trendtrend2.15_trend -0.64944880  0.0773890500  0.88146450 1.00
## te(temp,month):trendtrend3.1_trend  -0.61350947 -0.0572425000  0.40289215 1.00
## te(temp,month):trendtrend3.2_trend  -0.71272913 -0.0798854500  0.42192155 1.00
## te(temp,month):trendtrend3.3_trend  -0.46043307  0.0915480000  0.72561487 1.00
## te(temp,month):trendtrend3.4_trend  -0.36656188  0.0028792150  0.38725237 1.01
## te(temp,month):trendtrend3.5_trend  -0.76511215 -0.2524720000  0.13269142 1.01
## te(temp,month):trendtrend3.6_trend  -0.52008455 -0.0982236000  0.24636327 1.00
## te(temp,month):trendtrend3.7_trend  -0.14913722  0.1839180000  0.53914595 1.01
## te(temp,month):trendtrend3.8_trend  -0.64403232 -0.0001922949  0.61603805 1.01
## te(temp,month):trendtrend3.9_trend  -0.71253013 -0.2417660000  0.11851687 1.01
## te(temp,month):trendtrend3.10_trend -0.49742188 -0.0258204500  0.40816445 1.01
## te(temp,month):trendtrend3.11_trend -0.17881643  0.2303780000  0.71694547 1.01
## te(temp,month):trendtrend3.12_trend -0.73374132  0.1059265000  0.80145840 1.01
## te(temp,month):trendtrend3.13_trend -0.70376655 -0.1301305000  0.29372898 1.01
## te(temp,month):trendtrend3.14_trend -0.56899765 -0.1036450000  0.31146275 1.00
## te(temp,month):trendtrend3.15_trend -0.56149095  0.1201615000  1.00273100 1.01
## te(temp,month):trendtrend4.1_trend  -0.94546890 -0.2964710000  0.20375927 1.01
## te(temp,month):trendtrend4.2_trend  -0.74747685 -0.1572850000  0.42700905 1.00
## te(temp,month):trendtrend4.3_trend  -0.76925390 -0.0699267000  0.47050937 1.00
## te(temp,month):trendtrend4.4_trend  -0.42540967 -0.0127230000  0.48415762 1.01
## te(temp,month):trendtrend4.5_trend  -0.47375273 -0.0119113000  0.41012855 1.01
## te(temp,month):trendtrend4.6_trend  -0.39045695 -0.0221083000  0.45294492 1.01
## te(temp,month):trendtrend4.7_trend  -0.44476683 -0.0521473000  0.29861760 1.01
## te(temp,month):trendtrend4.8_trend  -0.72156707 -0.0722574000  0.56435882 1.01
## te(temp,month):trendtrend4.9_trend  -0.19104732  0.1952870000  0.64633587 1.01
## te(temp,month):trendtrend4.10_trend -0.18120290  0.2838715000  0.95072765 1.02
## te(temp,month):trendtrend4.11_trend -0.66777910 -0.0957446500  0.39970645 1.01
## te(temp,month):trendtrend4.12_trend -0.99622532 -0.1908115000  0.62969342 1.01
## te(temp,month):trendtrend4.13_trend -0.46524175  0.0652123500  0.66044700 1.01
## te(temp,month):trendtrend4.14_trend -0.33466883  0.0931128000  0.64220122 1.01
## te(temp,month):trendtrend4.15_trend -1.18785150 -0.2452110000  0.54722610 1.01
## te(temp,month):trendtrend5.1_trend  -0.39520713  0.2231535000  1.03138125 1.00
## te(temp,month):trendtrend5.2_trend  -0.42919422  0.2673090000  1.50686850 1.01
## te(temp,month):trendtrend5.3_trend  -1.33868050 -0.4116280000  0.29518827 1.00
## te(temp,month):trendtrend5.4_trend  -0.67753098 -0.2230420000  0.25835682 1.01
## te(temp,month):trendtrend5.5_trend   0.03753810  0.5733865000  1.26949675 1.01
## te(temp,month):trendtrend5.6_trend  -0.17045192  0.2521555000  0.96063465 1.00
## te(temp,month):trendtrend5.7_trend  -0.60382588 -0.1871740000  0.21505442 1.01
## te(temp,month):trendtrend5.8_trend  -0.79398660 -0.1180965000  0.59303525 1.01
## te(temp,month):trendtrend5.9_trend  -0.09132313  0.3148105000  0.77097870 1.02
## te(temp,month):trendtrend5.10_trend -0.01954015  0.4850300000  1.17813400 1.02
## te(temp,month):trendtrend5.11_trend -0.67780830 -0.1254220000  0.39201765 1.01
## te(temp,month):trendtrend5.12_trend -0.90756330 -0.1463735000  0.66015102 1.00
## te(temp,month):trendtrend5.13_trend -0.76408973 -0.0208821000  0.54733222 1.00
## te(temp,month):trendtrend5.14_trend -0.06889556  0.3643395000  1.02134050 1.01
## te(temp,month):trendtrend5.15_trend -1.11173250 -0.1283090000  0.78747217 1.01
##                                     n.eff
## te(temp,month).1_trend                364
## te(temp,month).2_trend                297
## te(temp,month).3_trend                386
## te(temp,month).4_trend                240
## te(temp,month).5_trend                223
## te(temp,month).6_trend                248
## te(temp,month).7_trend                277
## te(temp,month).8_trend                353
## te(temp,month).9_trend                268
## te(temp,month).10_trend               171
## te(temp,month).11_trend               257
## te(temp,month).12_trend               477
## te(temp,month).13_trend               520
## te(temp,month).14_trend               294
## te(temp,month).15_trend               277
## te(temp,month):trendtrend1.1_trend    323
## te(temp,month):trendtrend1.2_trend    751
## te(temp,month):trendtrend1.3_trend    417
## te(temp,month):trendtrend1.4_trend    213
## te(temp,month):trendtrend1.5_trend    262
## te(temp,month):trendtrend1.6_trend    726
## te(temp,month):trendtrend1.7_trend    316
## te(temp,month):trendtrend1.8_trend    312
## te(temp,month):trendtrend1.9_trend    247
## te(temp,month):trendtrend1.10_trend   412
## te(temp,month):trendtrend1.11_trend   299
## te(temp,month):trendtrend1.12_trend   444
## te(temp,month):trendtrend1.13_trend   401
## te(temp,month):trendtrend1.14_trend   360
## te(temp,month):trendtrend1.15_trend   310
## te(temp,month):trendtrend2.1_trend    482
## te(temp,month):trendtrend2.2_trend    453
## te(temp,month):trendtrend2.3_trend    755
## te(temp,month):trendtrend2.4_trend    719
## te(temp,month):trendtrend2.5_trend    371
## te(temp,month):trendtrend2.6_trend    767
## te(temp,month):trendtrend2.7_trend    436
## te(temp,month):trendtrend2.8_trend   1096
## te(temp,month):trendtrend2.9_trend    412
## te(temp,month):trendtrend2.10_trend   337
## te(temp,month):trendtrend2.11_trend   747
## te(temp,month):trendtrend2.12_trend   719
## te(temp,month):trendtrend2.13_trend   844
## te(temp,month):trendtrend2.14_trend   715
## te(temp,month):trendtrend2.15_trend   571
## te(temp,month):trendtrend3.1_trend    568
## te(temp,month):trendtrend3.2_trend    688
## te(temp,month):trendtrend3.3_trend    504
## te(temp,month):trendtrend3.4_trend    258
## te(temp,month):trendtrend3.5_trend    287
## te(temp,month):trendtrend3.6_trend    670
## te(temp,month):trendtrend3.7_trend    306
## te(temp,month):trendtrend3.8_trend    245
## te(temp,month):trendtrend3.9_trend    261
## te(temp,month):trendtrend3.10_trend   301
## te(temp,month):trendtrend3.11_trend   404
## te(temp,month):trendtrend3.12_trend   289
## te(temp,month):trendtrend3.13_trend   545
## te(temp,month):trendtrend3.14_trend   443
## te(temp,month):trendtrend3.15_trend   468
## te(temp,month):trendtrend4.1_trend    543
## te(temp,month):trendtrend4.2_trend    715
## te(temp,month):trendtrend4.3_trend    685
## te(temp,month):trendtrend4.4_trend    419
## te(temp,month):trendtrend4.5_trend    500
## te(temp,month):trendtrend4.6_trend    412
## te(temp,month):trendtrend4.7_trend    742
## te(temp,month):trendtrend4.8_trend    496
## te(temp,month):trendtrend4.9_trend    353
## te(temp,month):trendtrend4.10_trend   171
## te(temp,month):trendtrend4.11_trend   410
## te(temp,month):trendtrend4.12_trend   347
## te(temp,month):trendtrend4.13_trend   727
## te(temp,month):trendtrend4.14_trend   432
## te(temp,month):trendtrend4.15_trend   370
## te(temp,month):trendtrend5.1_trend    344
## te(temp,month):trendtrend5.2_trend    337
## te(temp,month):trendtrend5.3_trend    276
## te(temp,month):trendtrend5.4_trend    297
## te(temp,month):trendtrend5.5_trend    194
## te(temp,month):trendtrend5.6_trend    349
## te(temp,month):trendtrend5.7_trend    325
## te(temp,month):trendtrend5.8_trend    407
## te(temp,month):trendtrend5.9_trend    285
## te(temp,month):trendtrend5.10_trend   182
## te(temp,month):trendtrend5.11_trend   488
## te(temp,month):trendtrend5.12_trend   853
## te(temp,month):trendtrend5.13_trend   959
## te(temp,month):trendtrend5.14_trend   312
## te(temp,month):trendtrend5.15_trend   551
## 
## GAM process smoothing parameter (rho) estimates:
##                                               2.5%        50%    97.5% Rhat
## te(temp,month)_rho_trend                1.71740600  3.2887300 4.218339 1.00
## te(temp,month)2_rho_trend              -0.03553962  2.3135250 4.014890 1.02
## te(temp,month)3_rho_trend              -0.82144222  2.1888450 4.081949 1.02
## te(temp,month):seriestrend1_rho_trend   0.67336710  2.6554700 3.965424 1.01
## te(temp,month):seriestrend12_rho_trend -2.45043375 -0.4959015 2.459001 1.01
## te(temp,month):seriestrend13_rho_trend -1.38823000  2.7040200 4.076347 1.01
## te(temp,month):seriestrend2_rho_trend   0.90970628  3.0970600 4.166231 1.00
## te(temp,month):seriestrend22_rho_trend  1.24707850  3.1438800 4.141240 1.00
## te(temp,month):seriestrend23_rho_trend  0.96413485  3.1454300 4.172891 1.00
## te(temp,month):seriestrend3_rho_trend   1.42559775  3.2381250 4.148147 1.00
## te(temp,month):seriestrend32_rho_trend  0.53902565  2.9125400 4.106233 1.01
## te(temp,month):seriestrend33_rho_trend  0.32923218  3.0100100 4.156952 1.01
## te(temp,month):seriestrend4_rho_trend   1.12914725  3.2116350 4.160849 1.01
## te(temp,month):seriestrend42_rho_trend  0.31757170  2.8511600 4.128165 1.01
## te(temp,month):seriestrend43_rho_trend  0.38299955  3.0479650 4.152090 1.00
## te(temp,month):seriestrend5_rho_trend   0.99937777  3.1421550 4.171079 1.01
## te(temp,month):seriestrend52_rho_trend -0.49333570  1.9588300 3.904954 1.01
## te(temp,month):seriestrend53_rho_trend  0.53241375  3.0059500 4.134600 1.00
##                                        n.eff
## te(temp,month)_rho_trend                 812
## te(temp,month)2_rho_trend                208
## te(temp,month)3_rho_trend                303
## te(temp,month):seriestrend1_rho_trend    445
## te(temp,month):seriestrend12_rho_trend   233
## te(temp,month):seriestrend13_rho_trend   260
## te(temp,month):seriestrend2_rho_trend    564
## te(temp,month):seriestrend22_rho_trend   504
## te(temp,month):seriestrend23_rho_trend   800
## te(temp,month):seriestrend3_rho_trend   1050
## te(temp,month):seriestrend32_rho_trend   265
## te(temp,month):seriestrend33_rho_trend   283
## te(temp,month):seriestrend4_rho_trend    410
## te(temp,month):seriestrend42_rho_trend   259
## te(temp,month):seriestrend43_rho_trend   563
## te(temp,month):seriestrend5_rho_trend    543
## te(temp,month):seriestrend52_rho_trend   210
## te(temp,month):seriestrend53_rho_trend   567
## 
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhats above 1.05 found for 40 parameters
## *Diagnose further to investigate why the chains have not mixed
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## Chain 3: E-FMI = 0.1924
## Chain 4: E-FMI = 0.1657
## *E-FMI below 0.2 indicates you may need to reparameterize your model

We can again plot the smooth functions, which this time operate on the process model. The coefficients for this model are now accessible through the trend_mgcv_model slot in the model object:

gratia::draw(var_mod$trend_mgcv_model, select = 1)

We can see the same plot using the mvgam version by using trend_effects = TRUE in the plotting functions:

plot(var_mod, 'smooths', trend_effects = TRUE)

The VAR matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model:

mcmc_plot(var_mod, variable = 'A', regex = TRUE, type = 'hist')

Unfortunately bayesplot doesn’t know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. A little bit of wrangling gives us these histograms in the correct order:

A_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    A_pars[i, j] <- paste0('A[', i, ',', j, ']')
  }
}
mcmc_plot(var_mod, 
          variable = as.vector(t(A_pars)), 
          type = 'hist')

There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [3,1], which is quite strongly negative, means that an increase in the process for series 3 (Greens) at time \(t\) is expected to lead to a subsequent decrease in the process for series 1 (Bluegreens) at time \(t+1\). The latent process model is now capturing these effects and the smooth seasonal effects, so the trend plot shows our best estimate of what the true count should have been at each time point:

plot(var_mod, type = 'trend', series = 1)

plot(var_mod, type = 'trend', series = 3)

The process error \((\Sigma)\) captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes:

Sigma_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']')
  }
}
mcmc_plot(var_mod, 
          variable = as.vector(t(Sigma_pars)), 
          type = 'hist')

The observation error estimates \((\sigma_{obs})\) represent how much the model thinks we might miss the true count when we take our imperfect measurements:

mcmc_plot(var_mod, variable = 'sigma_obs', regex = TRUE, type = 'hist')

These are still a bit hard to identify overall.

Correlated process errors

Let’s see if the estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors:

priors <- get_mvgam_priors(
  # observation formula, which just uses an intercept
  y ~ 1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend),
  
  # VAR1 model with correlated process errors
  trend_model = 'VAR1cor',
  family = gaussian(),
  data = plankton_train)

# update the sigma_obs prior so that it avoids very small
# and very large values that are nonsensical
priors$prior[11] <- "sigma_obs ~ uniform(0.15, 1);"
priors$new_lowerbound[11] <- 0.15
priors$new_upperbound[11] <- 1

And now we can fit the correlated process error model. Note that I am using fewer posterior samples for this model so it is easier to work with for completing the exercises:

varcor_mod <- mvgam(  
  # observation formula, which just uses an intercept
  y ~ 1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend),
  
  # VAR1 model with correlated process errors
  trend_model = 'VAR1cor',
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  
  # include the updated priors
  priors = priors,
  
  # use reduced samples for inclusion in tutorial data
  samples = 100)

Plot convergence diagnostics for the two models, which shows that the more complex model with correlated errors has better convergence:

mcmc_plot(varcor_mod, type = 'rhat') +
  labs(title = 'VAR1cor')

mcmc_plot(var_mod, type = 'rhat') +
  labs(title = 'VAR1')

We can also check convergence using one of the package’s inbuilt utility functions:

mvgam:::check_rhat(varcor_mod$model_output)
## Rhats above 1.05 found for 20 parameters
## *Diagnose further to investigate why the chains have not mixed
mvgam:::check_rhat(var_mod$model_output)
## Rhats above 1.05 found for 40 parameters
## *Diagnose further to investigate why the chains have not mixed

This model has converged better than the first model (that assumed process errors were independent), possibly telling us that this is a more appropriate model. The \((\Sigma)\) matrix now captures any evidence of contemporaneously correlated process error:

Sigma_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']')
  }
}
mcmc_plot(varcor_mod, 
          variable = as.vector(t(Sigma_pars)), 
          type = 'hist')

This symmetric matrix tells us there is support for correlated process errors. For example, series 1 and 3 (Bluegreens and Greens) show negatively correlated process errors, while series 1 and 4 (Bluegreens and Other.algae) show positively correlated errors. Because this model is able to capture correlated errors, the VAR matrix has changed slightly:

A_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    A_pars[i, j] <- paste0('A[', i, ',', j, ']')
  }
}
mcmc_plot(varcor_mod, 
          variable = as.vector(t(A_pars)), 
          type = 'hist')

We still have some evidence of lagged cross-dependence, but some of these interactions have now been pulled more toward zero. But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set:

# create forecast objects for each model
fcvar <- forecast(var_mod)
fcvarcor <- forecast(varcor_mod)

# plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = 'variogram')$all_series$score -
  score(fcvar, score = 'variogram')$all_series$score
plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', 
     ylim = c(-1*max(abs(diff_scores), na.rm = TRUE),
              max(abs(diff_scores), na.rm = TRUE)),
     bty = 'l',
     xlab = 'Forecast horizon',
     ylab = expression(variogram[VAR1cor]~-~variogram[VAR1]))
abline(h = 0, lty = 'dashed')

The VAR1cor model seems to do a better job overall, but there are some horizons where the uncorrelated error model gives slightly better forecasts. This is often the case when comparing models: some models give better forecasts some of the time, while other models do better in other contexts. But overall, I’d choose the second model here as it converged better and tends to give better community-level forecasts

Exercises

  1. Plot conditional effects of month and temperature for each algal group. Hint, see the documentation in ?marginaleffects::plot_predictions for guidance
  2. Calculate the posterior median process error correlation matrix. Which two algal groups are the most negatively correlated in their process errors? Hint, see ?mvgam::as.matrix.mvgam for guidance on extracting posterior estimates of \(\Sigma\). Each posterior draw will be a vector of length 25, so you will need to convert that vector to a matrix. Then you can see ?cov2cor for guidance on calculating correlation matrices from a covariance matrix
Check here for template code if you’re having trouble plotting conditional effects by algal group
# Replace the ? with the correct value(s)
# You can use 'plot_predictions' to generate conditional effects plots that are stratified over a number of variables (up to three at once).
# This will feed a particular grid of 'newdata' to the 'predict.mvgam' 
# function, returning conditional predictions on the response scale
?marginaleffects::plot_predictions
plot_predictions(varcor_mod,
                 condition = c(?, ?, ?),
                 conf_level = 0.8)


Check here for template code if you’re having trouble calculating the posterior median error correlation matrix
# Replace the ? with the correct value(s)
# You can use 'as.matrix.mvgam' to extract posterior estimates of the process error matrix Sigma
?mvgam::as.matrix.mvgam
Sigmas <- as.matrix(varcor_mod, variable = ?, regex = ?)

# Each draw will be a vector of length 25 (representing the 5x5 covariance matrix); calculate posterior medians for each column
?apply
?median
median_Sigma <- apply(Sigmas, ?, ?)

# Convert to a correlation matrix
?cov2cor
cov2cor(matrix(median_Sigma, nrow = 5, ncol = 5))

Session Info

sessionInfo()
## R version 4.2.3 (2023-03-15 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.4.2          gratia_0.8.1           bayesplot_1.10.0      
##  [4] tidybayes_3.0.4        mvgam_1.0.5            insight_0.19.3        
##  [7] marginaleffects_0.13.0 brms_2.19.0            Rcpp_1.0.10           
## [10] mgcv_1.8-42            nlme_3.1-162           dplyr_1.1.1           
## [13] downloadthis_0.3.2    
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.1-0     ggridges_0.5.4       ellipsis_0.3.2      
##   [4] markdown_1.7         base64enc_0.1-3      fs_1.6.2            
##   [7] rstudioapi_0.14      farver_2.1.1         rstan_2.21.8        
##  [10] svUnit_1.0.6         DT_0.28              fansi_1.0.4         
##  [13] mvtnorm_1.2-2        lubridate_1.9.2      bridgesampling_1.1-2
##  [16] codetools_0.2-19     splines_4.2.3        cachem_1.0.8        
##  [19] knitr_1.43           shinythemes_1.2.0    jsonlite_1.8.7      
##  [22] bsplus_0.1.4         ggdist_3.3.0         shiny_1.7.4         
##  [25] compiler_4.2.3       backports_1.4.1      assertthat_0.2.1    
##  [28] Matrix_1.5-3         fastmap_1.1.1        cli_3.6.1           
##  [31] later_1.3.1          htmltools_0.5.5      prettyunits_1.1.1   
##  [34] tools_4.2.3          igraph_1.5.0         coda_0.19-4         
##  [37] gtable_0.3.3         glue_1.6.2           reshape2_1.4.4      
##  [40] posterior_1.4.1      jquerylib_0.1.4      vctrs_0.6.1         
##  [43] crosstalk_1.2.0      tensorA_0.36.2       xfun_0.39           
##  [46] stringr_1.5.0        ps_1.7.5             collapse_1.9.6      
##  [49] timechange_0.2.0     mime_0.12            miniUI_0.1.1.1      
##  [52] lifecycle_1.0.3      gtools_3.9.4         klippy_0.0.0.9500   
##  [55] MASS_7.3-58.2        zoo_1.8-12           scales_1.2.1        
##  [58] colourpicker_1.2.0   promises_1.2.0.1     Brobdingnag_1.2-9   
##  [61] parallel_4.2.3       inline_0.3.19        RColorBrewer_1.1-3  
##  [64] shinystan_2.6.0      mvnfast_0.2.8        yaml_2.3.7          
##  [67] gridExtra_2.3        loo_2.6.0            StanHeaders_2.26.26 
##  [70] sass_0.4.7           stringi_1.7.12       highr_0.10          
##  [73] dygraphs_1.1.1.6     checkmate_2.2.0      pkgbuild_1.4.2      
##  [76] zip_2.3.0            rlang_1.1.1          pkgconfig_2.0.3     
##  [79] matrixStats_1.0.0    distributional_0.3.2 evaluate_0.21       
##  [82] lattice_0.20-45      purrr_1.0.1          labeling_0.4.2      
##  [85] patchwork_1.1.2      rstantools_2.3.1     htmlwidgets_1.6.2   
##  [88] tidyselect_1.2.0     processx_3.8.1       plyr_1.8.8          
##  [91] magrittr_2.0.3       R6_2.5.1             generics_0.1.3      
##  [94] withr_2.5.0          pillar_1.9.0         xts_0.13.1          
##  [97] abind_1.4-5          tibble_3.2.1         crayon_1.5.2        
## [100] arrayhelpers_1.1-0   utf8_1.2.3           rmarkdown_2.23      
## [103] isoband_0.2.7        grid_4.2.3           data.table_1.14.8   
## [106] callr_3.7.3          threejs_0.3.3        digest_0.6.31       
## [109] xtable_1.8-4         tidyr_1.3.0          httpuv_1.6.11       
## [112] scoringRules_1.0.2   RcppParallel_5.1.7   stats4_4.2.3        
## [115] munsell_0.5.0        bslib_0.5.0          shinyjs_2.1.0